JOHANNES KEPLER UNIVERSITY LINZ
约翰内斯·开普勒大学林茨
Submitted by
提交者
Marius-Constantin Dinu
马里乌斯·康斯坦丁·迪努
11728074
Submitted at
提交于
Institute for Machine
机器研究所

Parameter Choice and Neuro-Symbolic Approaches for Deep Domain- Invariant Learning
参数选择与深度领域不变学习的神经符号方法

Learning
学习
Thesis Supervisor / First Evaluator
论文导师 / 第一评审
Sepp Hochreiter
塞普·霍赫雷特

Acknowledgments
致谢

My academic journey has been supported by countless individuals - researchers advancing our field, family, colleagues, and friends. Though too numerous to name, their contributions are deeply appreciated.
我的学术旅程得到了无数人的支持——推动我们领域发展的研究人员、家人、同事和朋友。虽然无法一一列举,但我深深感激他们的贡献。
Firstly, I am deeply indebted to Sepp Hochreiter for establishing this institute, which has significantly shaped my life. His dedication to sharing his knowledge and experience with me has been invaluable, and his expertise continues to inspire me. Words cannot fully capture my appreciation.
首先,我深深感谢塞普·霍赫赖特尔,他创立了这个机构,极大地影响了我的生活。他与我分享知识和经验的奉献精神是无价的,他的专业知识继续激励着我。言语无法完全表达我的感激之情。
Similarly, I express my heartfelt thanks to Werner Zellinger and Sergei Pereverzyev. Werner, in particular, has been a vital mentor during the latter part of my studies. His insightful discussions and advice have been incredibly beneficial. Likewise, Sergei's contributions have been significant, and I've learned a great deal from his work and insights.
同样,我衷心感谢维尔纳·泽林格和谢尔盖·佩列韦尔泽夫。特别是维尔纳,在我学习的后期,他是一个重要的导师。他的深刻讨论和建议对我非常有帮助。同样,谢尔盖的贡献也很重要,我从他的工作和见解中学到了很多。
I extend my gratitude to my colleagues and friends at the Institute for Machine Learning at Johannes Kepler University. I cherish the engaging discussions, the supportive and friendly environment, the lasting friendships, the exciting conference experiences, and the countless enjoyable moments we've shared. Their support has been critical in navigating a field that is both promising and increasingly challenging, competitive, and sometimes discouraging to pursue a PhD in. I am grateful to Gary Marcus, whose work and our discussions significantly inspired and contributed to the ideas in this thesis. I am also thankful to all my co-authors and peers, whom I thoroughly enjoyed collaborating with. Special mentions go to Claudiu Leoveanu-Condrei, Markus Hofmarcher, Markus Holzleitner, Andreas Radler, Kajetan Schweighofer, José Arjona-Medina and Lukas Gruber. Furthermore, I am grateful to Michael Laux, whose profound wisdom and knowledge, despite brief interactions, offered invaluable guidance and inspiration that will shape my work and perspective.
我衷心感谢约翰内斯·开普勒大学机器学习研究所的同事和朋友们。我珍视我们之间的深入讨论、支持和友好的环境、持久的友谊、激动人心的会议经历,以及我们共享的无数愉快时光。他们的支持在我探索一个既充满希望又日益具有挑战性、竞争激烈且有时令人沮丧的博士研究领域时至关重要。我感谢加里·马库斯,他的工作以及我们的讨论极大地激发并贡献了本论文中的思想。我也感谢所有与我合作的共同作者和同行们,和他们的合作让我十分愉快。特别提到克劳迪乌·莱奥维亚努-孔德雷、马库斯·霍夫马赫、马库斯·霍尔茨莱特纳、安德烈亚斯·拉德勒、凯亚坦·施韦戈费尔、何塞·阿尔霍纳-梅迪纳和卢卡斯·格鲁伯。此外,我还要感谢迈克尔·劳克斯,尽管我们的互动很短暂,但他深厚的智慧和知识为我提供了宝贵的指导和灵感,这将塑造我的工作和视角。
Finally, I want to express my deepest gratitude to my parents, friends, family, and especially Laura. Your unwavering support has been my cornerstone, without which I could not have pursued my academic goals.
最后,我要向我的父母、朋友、家人,特别是劳拉表达我最深切的感谢。你们不懈的支持是我的基石,没有你们我无法追求我的学术目标。

Abstract
摘要

As artificial intelligence (AI) systems advance, we move towards broad AI: systems capable of performing well on diverse tasks, understanding context, and adapting rapidly to new scenarios. A central challenge for broad AI systems is to generalize over tasks in related domains and being robust to distribution shifts. Neuro-symbolic (NeSy) AI bridges the gap between symbolic and sub-symbolic paradigms to address these challenges. This enables us to create adaptable, generalizable, and more interpretable systems. These characteristics are fundamental for broad AI systems, which aim to solve a wide range of tasks across various domains. Consequently, the development of broad AI requires advancements in domain adaptation (DA), enabling models trained on source domains to effectively generalize to unseen target domains. Traditional approaches often rely on parameter optimization, fine-tuning, and representation learning methods, which can be impractical due to high costs and risks of catastrophic forgetting. In contrast, NeSy AI systems use multiple models and methods to generalize to unseen domains and maintain performance across varying conditions. However, building scalable and generalizable hybrid systems remains challenging.
随着人工智能(AI)系统的进步,我们正朝着广泛人工智能的方向发展:这些系统能够在多样化的任务上表现良好,理解上下文,并迅速适应新场景。广泛人工智能系统面临的一个核心挑战是能够在相关领域的任务上进行泛化,并对分布变化具有鲁棒性。神经符号(NeSy)人工智能弥合了符号和子符号范式之间的差距,以应对这些挑战。这使我们能够创建可适应、可泛化且更具可解释性的系统。这些特性对于广泛人工智能系统至关重要,因为它们旨在解决各个领域的广泛任务。因此,广泛人工智能的发展需要在领域适应(DA)方面取得进展,使得在源领域训练的模型能够有效地泛化到未见过的目标领域。传统方法通常依赖于参数优化、微调和表示学习方法,这在高成本和灾难性遗忘的风险下可能不切实际。相比之下,NeSy人工智能系统使用多种模型和方法来泛化到未见过的领域,并在不同条件下保持性能。然而,构建可扩展和可泛化的混合系统仍然具有挑战性。
In this work, we analyze common DA and NeSy approaches with the focus on deep domain-invariant learning. This analysis extends to real-world challenges, such as adapting to continuously changing domains, handling missing modalities, and dealing with large domain gaps between source and target domains. We showcase state-of-the-art model-selection and parameter choice methods for scenarios with limited samples where gradient-based optimization is feasible. For cases where model tuning is infeasible, we introduce domain-specific adaptations without gradient-based updates, connecting in-context learning to domain-invariant learning. This approach presents NeSy methodologies that enable adaptation in previously challenging situations. Therefore, my work establishes a framework for scalable and generalizable broad AI systems applicable across various problem settings. Furthermore, it demonstrates how symbolic reasoning and large language models (LLMs) can build universal computational graphs that generalize across domains and problems, contributing to more adaptable, generalizable, and interpretable AI approaches for real-world applications.
在本研究中,我们分析了常见的领域适应(DA)和神经符号(NeSy)方法,重点关注深度领域不变学习。该分析扩展到现实世界的挑战,例如适应不断变化的领域、处理缺失的模态以及应对源领域和目标领域之间的大领域差距。我们展示了在样本有限的情况下,基于梯度的优化是可行的状态-of-the-art 模型选择和参数选择方法。对于模型调优不可行的情况,我们引入了无梯度更新的领域特定适应,将上下文学习与领域不变学习连接起来。这种方法提出了 NeSy 方法论,使得在以前具有挑战性的情况下能够进行适应。因此,我的工作建立了一个可扩展且可推广的广泛人工智能系统框架,适用于各种问题设置。此外,它展示了符号推理和大型语言模型(LLMs)如何构建跨领域和问题的通用计算图,从而为现实世界应用提供更具适应性、可推广性和可解释性的人工智能方法。

Marius-Constantin Dinu

PUBLICATIONS
发表的论文

[1] M.-C. Dinu, C. Leoveanu-Condrei, M. Holzleitner, W. Zellinger, and S. Hochreiter, "SymbolicAI: A framework for logic-based approaches combining generative models and solvers", in Conference on Lifelong Learning Agents, PMLR, 2024.
[2] M.-C. Dinu, C. Leoveanu-Condrei, M. Holzleitner, W. Zellinger, and S. Hochreiter, "SymbolicAI: A framework for logic-based approaches combining generative models and solvers", GenAL4DM Workshop at The Twelfth International Conference on Learning Representations, 2024.
[3] A. Patel, M. Hofmarcher, C. Leoveanu-Condrei, M.-C. Dinu, C. Callison-Burch, and S. Hochreiter, "Large language models can self-improve at web agent tasks", in Advances in Neural Information Processing Systems (under review), 2024.
[4] M.-C. Dinu, M. Holzleitner, M. Beck, H. D. Nguyen, A. Huber, H. Eghbal-zadeh, B. A. Moser, S. Pereverzyev, S. Hochreiter, and W. Zellinger, "Addressing parameter choice issues in unsupervised domain adaptation by aggregation", in International Conference on Learning Representations, 2023.
[5] M.-C. Dinu*, M. Hofmarcher*, V. P. Patil, M. Dorfer, P. M. Blies, J. Brandstetter, J. Arjona-Medina and S. S. Hochreiter, "XAI and strategy extraction via reward redistribution", in xxAI - Beyon Explainable Artificial Intelligence: State-of-the-Art and Future Challenges, ser. Lecture Notes in Artifcial Intelligence, vol. LNAI 13200, Springer International Publishing, May 2022.
[6] V. Patil*, M. Hofmarcher*, M.-C. Dinu, M. Dorfer, P. M. Blies, J. Brandstetter, J. Arjona-Medina, and S. Hochreiter, "Align-RUDDER: Learning from few demonstrations by reward redistribution", in Proceedings of the 39th International Conference on Machine Learning, ser. Proceedings of Machine Learning Research, vol. 162, PMLR, Jul. 2022, pp. 17531-17572.
[7] K. Schweighofer*, A. Radler*, M.-C. Dinu*, M. Hofmarcher, V. Patil, A. Bitto-Nemling, H. Eghbal-Zadeh, and S. Hochreiter, "A dataset perspective on offline reinforcement learning", in First Conference on Lifelong Learning Agents, Aug. 2022.
[8] R. Siripurapu, V. P. Patil, K. Schweighofer, M.-C. Dinu, T. Schmied, L. E. F. Diez, M. Holzleitner, H. Eghbal-Zadeh, M. K. Kopp, and S. Hochreiter, "InfODist: Online distillation with informative rewards improves generalization in curriculum learning", in Deep Reinforcement Learning Workshop NeurIPS, 2022.
[9] C. A. Steinparz, T. Schmied, F. Paischer, M.-C. Dinu, V. P. Patil, A. Bitto-Nemling, H. Eghbal-zadeh, and S. Hochreiter, "Reactive exploration to cope with non-stationarity in lifelong reinforcement learning", in Conference on Lifelong Learning Agents, PMLR, 2022, pp. 441-469.
[10] K. Schweighofer, M. Hofmarcher, M.-C. Dinu, P. Renz, A. Bitto-Nemling, V. P. Patil, and S. Hochreiter, "Understanding the effects of dataset characteristics on offline reinforcement learning", in Deep Reinforcement Learning Workshop NeurIPS 2021, Dec. 2021.
[11] W. Zellinger, N. Shepeleva, M.-C. Dinu, H. Eghbal-zadeh, H. D. Nguyen, B. Nessler, S. Pereverzyev, and B. A. Moser, "The balancing principle for parameter choice in distance-regularized domain adaptation", in Advances in Neural Information Processing Systems, M. Ranzato, A. Beygelzimer, Y. Dauphin, P. Liang, and J. W. Vaughan, Eds., vol. 34, Curran Associates, Inc., 2021, pp. 20798-20811.

  • Equal contribution.
  • 平等贡献。

[12] M. Holzleitner, J. A. Arjona-Medina, M.-C. Dinu, A. Vall, L. Gruber, and S. Hochreiter, "A two time-scale update rule ensuring convergence of episodic reinforcement learning algorithms at the example of rudder", NeurIPS Optimization Foundations for Reinforcement Learning Workshop, 2019.
[12] M. Holzleitner, J. A. Arjona-Medina, M.-C. Dinu, A. Vall, L. Gruber, 和 S. Hochreiter, "一种确保情景强化学习算法收敛的双时间尺度更新规则,以舵为例", NeurIPS 强化学习优化基础研讨会, 2019。

TEACHING
教学

  • Lecturer at Johannes Kepler University Linz Exercise in Deep Reinforcement Learning (365.250)
  • 林茨约翰·凯普勒大学讲师 深度强化学习课程(365.250)
  • Lecturer at Johannes Kepler University Linz
  • 林茨约翰·凯普勒大学讲师
Summer 2022
Exercise in Deep Reinforcement Learning (365.250)
深度强化学习课程(365.250)
  • Lecturer at Johannes Kepler University Linz
  • 林茨约翰·凯普勒大学讲师
Summer 2023
Exercise in Deep Reinforcement Learning (365.250)
深度强化学习课程(365.250)

Contents
目录

1 Introduction 1
1 引言 1
1.1 Domain Shift, Domain Adaptation and Domain-Invariant
1.1 领域转移、领域适应与领域不变性
Learning 4
1.2 Model Selection and Parameter Choice Methods for Unsuper-
vised Domain Adaptation 6
1.3 In-Context Learning 7
1.4 A Neuro-Symbolic Perspective on Large Language Models 22
1.4 大型语言模型的神经符号视角 22
1.5 List of Publications 24
1.5 发表的论文列表 24
2 Selected Publications 27
2 选定的出版物 27
2.1 The balancing principle for parameter choice in distance-
2.1 距离参数选择的平衡原则-
regularized domain adaptation 28
正则化领域适应 28
2.2 Addressing Parameter Choice Issues in Unsupervised Domain
2.2 解决无监督领域适应中的参数选择问题
Adaptation by Aggregation 56
通过聚合进行适应 56
2.3 SymbolicAI: A framework for logic-based approaches combin-
2.3 符号人工智能:一种基于逻辑的方法框架
ing generative models and solvers 108
结合生成模型和求解器
3 Conclusion and Outlook 155
3 结论与展望
3.1 Future Work 155
3.1 未来工作 155
A Glossary 167
术语表 167

Chapter 1 Introduction
第1章 引言

As artificial intelligence (AI) systems advance, we are moving towards broad AI capable of performing well on diverse tasks, understanding context, and adapting rapidly to new scenarios. Broad AI, as conceptualized by Hochreiter (2022), represents a significant advancement over current narrow AI systems. It aims to create sophisticated and adaptive systems characterized by enhanced capabilities in knowledge transfer and interaction, adaptability and robustness, abstraction and advanced reasoning, and efficiency. These systems are designed to quickly adapt to new situations, customers, products, processes, workflows, or sensory inputs, addressing the limitations of current deep learning approaches which often underperform in real-world applications due to their data-hungry nature and limited transfer capabilities. Key challenges in developing such broad AI systems include distribution shifts and generalization. Addressing these challenges is crucial for creating AI systems that can solve a wide range of tasks across various domains and overcome the limitations that currently hinder the widespread adoption of AI in critical real-world scenarios. Neuro-symbolic (NeSy) AI emerges as a promising approach to tackle these challenges by bridging the gap between symbolic and sub-symbolic paradigms of AI (Marcus, 2001, 2020). It combines the strengths of both approaches while mitigating their individual weaknesses. Symbolic AI excels in logical reasoning, interpretability, and explicit knowledge representation, but struggles with adaptability and learning from raw data. Sub-symbolic AI, exemplified by deep neural networks, excels in pattern recognition and learning from large datasets, but often lacks interpretability and struggles with abstract reasoning. NeSy AI aims to create synergy between these approaches by integrating neural learning with symbolic knowledge and reasoning (Garcez et al., 2008, 2015; Besold et al., 2017; Garcez et al., 2019; Garcez and Lamb, 2020; Lamb et al., 2020; Hamilton et al., 2022; Yu et al., 2023). This integration is challenging due to the fundamentally different nature of these paradigms, which has historically limited their combination. However, recent advances in machine learning - specifically due to advances made by large language models (LLMs) (Vaswani et al., 2017; Wei et al., 2022; Singhal et al., 2023; Schick et al., 2023) - and knowledge representation (Augusto, 2022) have made it possible to create hybrid systems that leverage the strengths of both approaches. LLMs provide a comprehensive natural language understanding and generalization capability, and enable a seamless combination of logical reasoning with statistical approaches. NeSy AI systems can potentially offer improved adaptability, robustness, generalizability, and interpretability compared to purely symbolic or sub-symbolic systems. These characteristics are essential for developing broad AI systems, which are designed to solve a wide range of tasks across various domains and adapt to new challenges without extensive re-training or fine-tuning. Therefore, to create broad AI systems, it is essential to advance the field of DA and, in particular, domain-invariant learning.
随着人工智能(AI)系统的进步,我们正朝着能够在多样化任务中表现良好、理解上下文并快速适应新场景的广泛AI迈进。正如Hochreiter(2022)所构想的,广泛AI代表了对当前狭窄AI系统的重大进步。它旨在创建复杂且自适应的系统,这些系统在知识转移和交互、适应性和鲁棒性、抽象和高级推理以及效率方面具有增强的能力。这些系统旨在快速适应新情况、新客户、新产品、新流程、新工作流或感官输入,以解决当前深度学习方法的局限性,因为这些方法通常由于其对数据的高度依赖和有限的转移能力而在实际应用中表现不佳。开发此类广泛AI系统的关键挑战包括分布转移和泛化。解决这些挑战对于创建能够解决各种领域广泛任务的AI系统至关重要,并克服目前阻碍AI在关键现实场景中广泛应用的限制。神经符号(NeSy)AI作为一种有前景的方法,旨在通过弥合符号和子符号AI范式之间的差距来应对这些挑战(Marcus, 2001, 2020)。它结合了两种方法的优势,同时减轻了各自的弱点。符号AI在逻辑推理、可解释性和显式知识表示方面表现出色,但在适应性和从原始数据中学习方面存在困难。子符号AI以深度神经网络为例,在模式识别和从大数据集中学习方面表现优异,但通常缺乏可解释性,并且在抽象推理方面存在困难。NeSy AI旨在通过将神经学习与符号知识和推理相结合,创造这些方法之间的协同效应(Garcez et al., 2008, 2015; Besold et al., 2017; Garcez et al., 2019; Garcez and Lamb, 2020; Lamb et al., 2020; Hamilton et al., 2022; Yu et al., 2023)。由于这些范式的本质差异,这种整合具有挑战性,这在历史上限制了它们的结合。然而,最近在机器学习方面的进展——尤其是大型语言模型(LLMs)所取得的进展(Vaswani et al., 2017; Wei et al., 2022; Singhal et al., 2023; Schick et al., 2023)——以及知识表示(Augusto, 2022)的进展,使得创建利用两种方法优势的混合系统成为可能。LLMs提供了全面的自然语言理解和泛化能力,并能够无缝地将逻辑推理与统计方法结合起来。与纯符号或子符号系统相比,NeSy AI系统有可能提供更好的适应性、鲁棒性、泛化能力和可解释性。这些特征对于开发广泛AI系统至关重要,这些系统旨在解决各种领域的广泛任务,并能够在没有广泛再训练或微调的情况下适应新挑战。因此,为了创建广泛AI系统,推动DA领域的发展,特别是领域不变学习,是至关重要的。
Domain adaptation (DA) focuses on training a model on data from a set of source domains to help generalize the model to unseen target domains (Widmer and Kubat, 1996; Ben-David et al., 2010; Liu et al., 2022). DA addresses the domain shift between the source and target domains, which is useful in scenarios where labeled data is scarce or expensive to obtain, making it impractical to re-train models from scratch for each new domain. Domain-invariant learning is a DA methodology that aims to learn shared features that are robust and consistent across different domains. Core techniques to domain-invariant learning include fine-tuning (Sugiyama et al., 2007; You et al., 2019; Zellinger et al., 2021), representation learning (Tzeng et al., 2014; Ganin et al., 2016; Sun et al., 2017; Zellinger et al., 2017; Chen et al., 2020; Rahman et al., 2020; Liu and Xue, 2021; Zhu et al., 2021), and novel parameter choice methods (Sugiyama et al., 2007, 2012; Saito et al., 2021; Zellinger et al., 2021; Musgrave et al., 2021). Parameter choice methods involve selecting optimal hyperparameters, regularization techniques, and model architectures to effectively transfer knowledge from source to target domains. Fine-tuning involves adjusting pre-trained models to adapt to new domains, while representation learning enables the learning of transferable features or embeddings that capture domain-invariant information. In the context of DA, both approaches aim to allow models to adapt to new domains by utilizing shared representations, while minimizing the need for extensive retraining.
领域适应 (DA) 关注于在一组源领域的数据上训练模型,以帮助模型推广到未见过的目标领域 (Widmer 和 Kubat, 1996; Ben-David 等, 2010; Liu 等, 2022)。DA 解决了源领域和目标领域之间的领域转移问题,这在标记数据稀缺或获取成本高昂的情况下非常有用,使得为每个新领域从头开始重新训练模型变得不切实际。领域不变学习是一种 DA 方法论,旨在学习在不同领域中稳健且一致的共享特征。领域不变学习的核心技术包括微调 (Sugiyama 等, 2007; You 等, 2019; Zellinger 等, 2021)、表示学习 (Tzeng 等, 2014; Ganin 等, 2016; Sun 等, 2017; Zellinger 等, 2017; Chen 等, 2020; Rahman 等, 2020; Liu 和 Xue, 2021; Zhu 等, 2021) 以及新颖的参数选择方法 (Sugiyama 等, 2007, 2012; Saito 等, 2021; Zellinger 等, 2021; Musgrave 等, 2021)。参数选择方法涉及选择最佳超参数、正则化技术和模型架构,以有效地将知识从源领域转移到目标领域。微调涉及调整预训练模型以适应新领域,而表示学习则使得能够学习可转移的特征或嵌入,这些特征或嵌入捕捉领域不变的信息。在 DA 的背景下,这两种方法旨在通过利用共享表示,使模型能够适应新领域,同时最小化大量重新训练的需求。
However, several problems appear in practice: fine-tuning models that exceed 100 billion parameters (Brown et al., 2020; Google, 2023; Touvron et al., 2023; Jiang et al., 2023) is often too computationally expensive or restricted due to proprietary limitations on accessing model weights. Additionally, the rate of change in real-world data is often intractable. Therefore, these approaches are not always feasible or applicable. Furthermore, representation learning approaches lack interpretability and suffer from catastrophic forgetting (Hebb, 1949; Carpenter and Grossberg, 1987; McCloskey and Cohen, 1989; Mermillod et al., 2013; Ehret et al., 2020), when the model is exposed to new domains, which stems from the underlying shift in data distribution across domains. Catastrophic forgetting refers to a model forgetting previously learned information upon learning new information, particularly in sequential learning tasks.
然而,在实践中出现了几个问题:微调超过1000亿参数的模型(Brown等,2020;Google,2023;Touvron等,2023;Jiang等,2023)通常计算成本过高或由于对模型权重的访问限制而受到限制。此外,现实世界数据的变化速度往往难以处理。因此,这些方法并不总是可行或适用。此外,表示学习方法缺乏可解释性,并且在模型接触新领域时会遭受灾难性遗忘(Hebb,1949;Carpenter和Grossberg,1987;McCloskey和Cohen,1989;Mermillod等,2013;Ehret等,2020),这源于跨领域数据分布的潜在变化。灾难性遗忘是指模型在学习新信息时忘记先前学习的信息,特别是在顺序学习任务中。
In contrast to traditional single-model approaches, NeSy AI systems leverage multiple models and methods to address domain-specific requirements and generalize across varying conditions. They can select between parametric, non-parametric and symbolic methods that incorporate symbolic reasoning up to powerful pattern recognition abilities, structured knowledge bases, and search engines. This diverse toolkit enables more adaptable and interpretable methodologies. A hybrid paradigm also enables the formulation of higher-level abstractions and logical expressions through dedicated symbolic approaches. It incorporates learning-for-reasoning methods to treat the learning aspect as an accelerator for reasoning, in which deep neural networks are employed to reduce the search space for symbolic systems (Silver et al., 2016, 2017b,a; Qu and Tang, 2019; Schrittwieser et al., 2020). Reasoning-for-learning that views reasoning as a way to regularize learning, in which symbolic knowledge acts as a guiding constraint that oversees machine learning tasks (Hu et al., 2016; Xu et al., 2018), and learning-reasoning to enable a symbiotic relationship between learning and reasoning to boost problem-solving capabilities (Donadello et al., 2017; Manhaeve et al., 2018; Mao et al., 2019; Ellis, 2023). While NeSy AI systems offer numerous advantages, their implementation presents significant challenges. In practice, building such hybrid systems is challenging due to the inherent complexity of integrating disparate symbolic and neural components, as well as the difficulty in designing interfaces that allow seamless interaction between these components. This complexity often results in ad-hoc implementations that scale poorly and don't generalize across a variety of tasks and domains. The lack of standardized frameworks for NeSy integration further exacerbates these issues, leading to solutions that are often specific to particular problem domains and difficult to adapt to new scenarios.
与传统的单模型方法相比,NeSy AI 系统利用多个模型和方法来满足特定领域的需求,并在不同条件下进行泛化。它们可以在参数化、非参数化和符号方法之间进行选择,这些方法结合了符号推理、强大的模式识别能力、结构化知识库和搜索引擎。这种多样化的工具包使得方法论更加灵活和可解释。混合范式还通过专门的符号方法使得更高层次的抽象和逻辑表达的形成成为可能。它结合了学习-推理方法,将学习视为推理的加速器,其中深度神经网络被用来减少符号系统的搜索空间(Silver et al., 2016, 2017b,a; Qu and Tang, 2019; Schrittwieser et al., 2020)。推理-学习将推理视为一种规范化学习的方式,其中符号知识作为指导约束,监督机器学习任务(Hu et al., 2016; Xu et al., 2018),以及学习-推理以促进学习和推理之间的共生关系,从而提升问题解决能力(Donadello et al., 2017; Manhaeve et al., 2018; Mao et al., 2019; Ellis, 2023)。尽管 NeSy AI 系统提供了许多优势,但其实施面临重大挑战。在实践中,构建这样的混合系统具有挑战性,因为将不同的符号和神经组件集成在一起的固有复杂性,以及设计允许这些组件之间无缝交互的接口的困难。这种复杂性往往导致临时实现,扩展性差,且无法在各种任务和领域中进行泛化。缺乏 NeSy 集成的标准化框架进一步加剧了这些问题,导致的解决方案通常特定于某些问题领域,且难以适应新场景。
Motivated by these limitations and the potential of NeSy AI, my work analyzes common DA and NeSy approaches, with a particular focus on deep domain-invariant learning. This analysis is conducted from two complementary perspectives: First, we address DA issues faced when dealing with restricted sample sizes and where gradient-based optimization is feasible, and propose novel model-selection and parameter choice methods (Zellinger et al., 2021; Dinu et al., 2023). Second, we establish a connection between in-context learning (Vaswani et al., 2017) - the ability of LLMs to adapt to new tasks through prompting - and domain-invariant learning. This approach focuses on NeSy methodologies in situations where tuning models is infeasible, introducing domain-specific adaptations without gradient-based updates (Dinu, 2022; Dinu et al., 2024a,b).
受到这些限制和NeSy AI潜力的启发,我的工作分析了常见的领域适应(DA)和神经符号(NeSy)方法,特别关注深度领域不变学习。该分析从两个互补的角度进行:首先,我们解决了在处理受限样本大小时面临的DA问题,以及在梯度优化可行的情况下,提出了新颖的模型选择和参数选择方法(Zellinger等,2021;Dinu等,2023)。其次,我们建立了上下文学习(Vaswani等,2017)——大型语言模型(LLMs)通过提示适应新任务的能力——与领域不变学习之间的联系。这种方法专注于在无法微调模型的情况下的NeSy方法,引入了无需基于梯度更新的领域特定适应(Dinu,2022;Dinu等,2024a,b)。
As a result, this work summarizes the following contributions:
因此,这项工作总结了以下贡献:
  • novel model-selection and parameter choice methods for unsupervised domain adaptation,
  • 针对无监督领域适应的新颖模型选择和参数选择方法,
  • a novel NeSy approach based on LLMs in situations where fine-tuning models is infeasible,
  • 在无法微调模型的情况下基于LLMs的新颖NeSy方法,
  • a scalable and generalizable framework for broad AI systems, which enable domain-specific adaptations,
  • 一个可扩展且可推广的广泛AI系统框架,能够实现领域特定的适应,
  • novel benchmarks for parameter choice methodologies and NeSy methodologies based on LLMs,
  • 针对基于LLMs的参数选择方法和NeSy方法的新基准,
  • how symbolic reasoning and LLMs can be connected to build universal computational graphs that generalize across domains and problems.
  • 如何将符号推理与LLMs连接起来,以构建跨领域和问题的通用计算图。
Therefore, our research contributes to the development of more transferable, robust, and interpretable AI approaches that meet the demands of real-world applications, such as robotics, personalized healthcare systems, and business intelligence platforms (Tesla, 2022; NVIDIA, 2024).
因此,我们的研究有助于开发更具可转移性、鲁棒性和可解释性的AI方法,以满足机器人技术、个性化医疗系统和商业智能平台等现实应用的需求(Tesla,2022;NVIDIA,2024)。

1.1 Domain Shift, Domain Adaptation and Domain- Invariant Learning
1.1 领域转移、领域适应与领域不变学习

The challenges of domain shift and the scarcity of labeled data present significant constraints to traditional deep learning approaches. To illustrate, in the automotive industry, models designed for classification and segmentation tasks are initially trained in a simulated setting, potentially with full access to labeled data, and aimed at being transferred to real-world applications. Despite the profusion of unlabeled real-world data, for instance, video recordings under various environmental conditions (Tesla, 2022), this task remains challenging due to intricate domain shifts between the simulated data and the target domain (Dosovitskiy et al., 2017).
领域转移的挑战和标注数据的稀缺对传统深度学习方法提出了重大限制。以汽车行业为例,针对分类和分割任务设计的模型最初是在模拟环境中训练的,可能完全访问标注数据,并旨在转移到实际应用中。尽管存在大量未标注的真实世界数据,例如在各种环境条件下的视频录制(Tesla, 2022),但由于模拟数据与目标领域之间复杂的领域转移,这一任务仍然具有挑战性(Dosovitskiy et al., 2017)。
The domain shift refers to the changes in the data distribution between the source domain (distribution p ) where the model is trained,and the target domain (distribution q ) where the model is applied. This shift can manifest in various forms,such as changes in feature distribution pXqX , label distribution pYqY ,and conditional distribution between labels given features pYXqYX .
领域转移是指模型训练的源领域(分布 p )与模型应用的目标领域(分布 q )之间的数据分布变化。这种转移可以以多种形式表现出来,例如特征分布 pXqX 、标签分布 pYqY 以及给定特征的标签之间的条件分布 pYXqYX 的变化。
To address these challenges systematically, we turn to the field of unsupervised domain adaptation (UDA). UDA provides a formal framework for transferring knowledge from a labeled source domain to an unlabeled target domain, directly tackling the issues of domain shift and label scarcity. In the following section, we present the mathematical foundations of UDA, which will serve as the basis for our subsequent analysis and proposed methods.
为了系统地解决这些挑战,我们转向无监督领域适应(UDA)领域。UDA提供了一个正式框架,用于将知识从标注的源领域转移到未标注的目标领域,直接应对领域转移和标签稀缺的问题。在接下来的部分中,我们将介绍UDA的数学基础,这将作为我们后续分析和提出的方法的基础。
Unsupervised Domain Adaptation Let XRd1 be a compact input space and YRd2 be a compact label space. Following Ben-David et al. (2010),we consider two datasets: A source dataset (x,y)=((x1,y1), , (xn,yn))(X×Y)n independently drawn according to some source distribution (probability measure) p on X×Y and an unlabeled target dataset x=(x1,,xm)Xm with elements independently drawn according to the marginal qX . The marginal distribution of p on X is analogously denoted as pX . The goal is to find f:XY with small target risk Rq(f) . For simplicity,we consider the expected target risk Rq(f)=X×Yf(x)yY2dq(x,y) with least squares loss. From this formulation, we can see that UDA aims to adapt a model trained on a labeled source domain to perform well on a new, unlabeled target domain, effectively addressing the challenges of domain shift and the absence of labeled data in the target domain (Ben-David et al., 2010; Liu et al., 2022).
无监督领域适应 设 XRd1 为一个紧凑的输入空间,设 YRd2 为一个紧凑的标签空间。根据 Ben-David 等人(2010)的研究,我们考虑两个数据集:一个源数据集 (x,y)=((x1,y1),(xn,yn))(X×Y)n,根据某个源分布(概率测度) pX×Y 上独立抽取,另一个是一个未标记的目标数据集 x=(x1,,xm)Xm,其元素根据边际 qX 独立抽取。边际分布 pX 上类似地表示为 pX。目标是找到具有小目标风险 Rq(f)f:XY。为简单起见,我们考虑最小二乘损失下的期望目标风险 Rq(f)=X×Yf(x)yY2dq(x,y)。从这个公式中,我们可以看出,UDA 旨在将一个在标记源领域上训练的模型适应到一个新的、未标记的目标领域,以便在目标领域有效应对领域转移和缺乏标记数据的挑战(Ben-David 等人,2010;Liu 等人,2022)。
Domain-invariant learning is a general principle to construct algorithms for DA that focuses on learning a feature extractor φ:XRd that maps input data xX to a shared d -dimensional representation space Rd with d1 . The goal is to obtain representations φ(x) that are similar across both the source domain and the target domain. More specifically, we aim to learn φ such that the associated probability distribution of φ(Xp) and φ(Xq) are as close as possible,where Xp and Xq are random variables with distribution p and q ,respectively,often measured using distribution distance metrics or divergences such as Maximum Mean Discrepancy (Gretton et al., 2012), Wasserstein distance and Central Moment Discrepancy (Zellinger et al., 2017). See Figure 1.1 for a visual illustration of domain-invariant learning.
域不变学习是一种构建适应性算法的通用原则,重点在于学习一个特征提取器 φ:XRd,该提取器将输入数据 xX 映射到一个共享的 d 维表示空间 Rd,并具有 d1 。其目标是获得在源域和目标域之间相似的表示 φ(x) 。更具体地说,我们的目标是学习 φ ,使得 φ(Xp)φ(Xq) 的相关概率分布尽可能接近,其中 XpXq 是具有分布 pq 的随机变量,通常使用分布距离度量或发散度量来衡量,如最大均值差异(Gretton et al., 2012)、Wasserstein 距离和中心矩差异(Zellinger et al., 2017)。有关域不变学习的视觉说明,请参见图 1.1。
Figure 1.1: Illustration of domain-invariant learning. Source and target domain features are transformed into domain-invariant features, which are then made class discriminative. This enables the model to distinguish between different classes while maintaining consistent performance across various domains.
图 1.1:域不变学习的示意图。源域和目标域特征被转换为域不变特征,然后使其具有类别区分性。这使得模型能够区分不同类别,同时在各种域中保持一致的性能。

1.2 Model Selection and Parameter Choice Meth- ods for Unsupervised Domain Adaptation
1.2 无监督域适应的模型选择和参数选择方法

Implementing effective UDA methods requires careful selection of model architectures and hyperparameters. This section examines approaches to model selection and parameter choice that aim to optimize UDA performance across diverse domains, focusing on how these methods can be applied to various UDA techniques such as moment matching (Zellinger et al., 2017; Gretton et al., 2012), divergence-based (Li et al., 2020), or discriminative approaches (Ganin et al., 2016), among others.
实施有效的无监督域适应(UDA)方法需要仔细选择模型架构和超参数。本节考察了旨在优化不同域之间 UDA 性能的模型选择和参数选择方法,重点讨论这些方法如何应用于各种 UDA 技术,如时刻匹配(Zellinger et al., 2017;Gretton et al., 2012)、基于发散度的方法(Li et al., 2020)或区分性方法(Ganin et al., 2016)等。
Parameter Choice Problem Typical approaches to parameter choice in UDA first involve training a sequence of m models with different hyper-parameter settings. This often includes modifying parameters such as the learning rate or weighting factors of loss terms. A current best practice for a second step is then to apply a model selection method to obtain the best-performing model with the minimum risk on the target domain. However, the state-of-the-art model selection methods (Sugiyama et al., 2007; You et al., 2019; Zellinger et al., 2021) can only perform as well as the best model within the sequence and disregard the m1 trained models,in spite of the computational power and time invested for training. Furthermore, the choice of models and parameters often hinges on the task-specific characteristics of the dataset, necessitating dedicated experimentation.
参数选择问题 在统一领域适应(UDA)中,参数选择的典型方法首先涉及训练一系列具有不同超参数设置的 m 模型。这通常包括修改学习率或损失项的权重因子等参数。当前的最佳实践是在第二步应用模型选择方法,以获得在目标领域上具有最低风险的最佳表现模型。然而,最先进的模型选择方法(Sugiyama et al., 2007; You et al., 2019; Zellinger et al., 2021)只能表现得与序列中的最佳模型一样好,而忽视了 m1 训练的模型,尽管在训练中投入了计算能力和时间。此外,模型和参数的选择往往依赖于数据集的任务特定特征,因此需要专门的实验。
Parameter Choice Contribution In our work (Dinu et al., 2023), we go beyond selecting a single model from the sequence of models by computing a weighted aggregation of all models within the sequence.
参数选择贡献 在我们的工作中(Dinu et al., 2023),我们通过计算序列中所有模型的加权聚合,超越了从模型序列中选择单一模型的过程。
More precisely,we calculate an ensemble fagg =k=1mckfk with minimal target risk Rq(fagg )=minc1,cmRRq(k=1mckfk) . Here fk denotes the models within the sequence and ck the aggregation coefficients. The optimal aggregation risk is less than or equal to the risk of the best model obtained by model selection,i.e., Rq(fagg )Rq(fsel ) ,because this model is included within the sequence of aggregated models. Using vector-valued least squares, the vector of optimal coefficients can be calculated as follows (see Section 2.2):
更准确地说,我们计算具有最小目标风险的集成 fagg =k=1mckfk。这里 fk 表示序列中的模型,ck 表示聚合系数。最优聚合风险小于或等于通过模型选择获得的最佳模型的风险,即 Rq(fagg )Rq(fsel ),因为该模型包含在聚合模型的序列中。使用向量值最小二乘法,可以如下计算最优系数的向量(见第2.2节):
(1.1)c=G1g=argmin(c1,,cm)RmXk=1mckfk(x)fq(x)Y2dqX(x),
where fq(x)=Yydq(yx) is the Bayesian predictor (the model with minimal risk on the target domain). The Gram matrix G consists of scalar products between the predictions of the individual models within the sequence,i.e. that G=(Xfk(x),fu(x)YdqX(x))k,u=1m ,and g= (Xfq(x),fk(x)YdqX(x))k=1m . Since calculating fq requires access to the labels of the target domain, to overcome this problem, under the assumptions of 1) a covariate shift and 2) a bounded density ratio between the marginals of the source distribution p and target distribution q ,so that β(x)=dqXdpX(x)[0,B] with some constant B>0 ,we arrive at the following result:
其中 fq(x)=Yydq(yx) 是贝叶斯预测器(在目标领域上具有最小风险的模型)。Gram 矩阵 G 由序列中各个模型预测之间的标量积组成,即 G=(Xfk(x),fu(x)YdqX(x))k,u=1mg= (Xfq(x),fk(x)YdqX(x))k=1m。由于计算 fq 需要访问目标领域的标签,为了解决这个问题,在以下假设下:1)协变量偏移和 2)源分布 p 和目标分布 q 之间的有界密度比,使得 β(x)=dqXdpX(x)[0,B] 具有某个常数 B>0,我们得出以下结果:
(1.2)g=(Xfp(x),fk(x)Yβ(x)dpX(x))k=1m.
Equation (1.2) computes the expectation over scalar products between the sequence of models and the Bayesian predictor fp weighted by the density ratio β(x) to adjust for the domain shift.
方程 (1.2) 计算模型序列与贝叶斯预测器 fp 之间的标量积的期望,并通过密度比 β(x) 加权,以调整领域偏移。
To realize the computations of Equation (1.1) algorithmically, we employ the following two steps: First,we estimate the density ratio β(x) (see e.g. Sugiyama et al. (2012) for a collection of possible methods). Second,we calculate the empirical estimators for the Gram matrix G~= (1ti=1tfk(xi),fu(xi)Y)k,u=1m and vector g~=(1si=1sβ(xi)yi,fk(xi)Y)k=1m based on our datasets,and gain aggregation coefficients c~=G~1g~ .
为了以算法方式实现方程 (1.1) 的计算,我们采用以下两个步骤:首先,我们估计密度比 β(x)(例如,参见 Sugiyama 等人(2012)关于可能方法的汇总)。其次,我们基于我们的数据集计算 Gram 矩阵 G~= (1ti=1tfk(xi),fu(xi)Y)k,u=1m 和向量 g~=(1si=1sβ(xi)yi,fk(xi)Y)k=1m 的经验估计器,并获得聚合系数 c~=G~1g~
Our method defines the state of the art in parameter choice for UDA, as it outperforms the current best practices in model selection with theoretical guarantees. We also performed large-scale empirical evaluations on a text-based domain (AmazonReviews (McAuley, 2021)), an image-based domain (DomainNet (Peng et al., 2018)), and four time-series datasets (Ragab et al., 2023), featuring a heterogeneity dataset for human activity recognition, electroencephalogram recordings of sleep activities, as well as smartphone and smartwatch activity and biometrics datasets.
我们的方法在 UDA 的参数选择方面定义了最先进的水平,因为它在模型选择中优于当前最佳实践,并具有理论保证。我们还在一个基于文本的领域(AmazonReviews(McAuley, 2021))、一个基于图像的领域(DomainNet(Peng 等人,2018))和四个时间序列数据集(Ragab 等人,2023)上进行了大规模的实证评估,涉及人类活动识别的异质性数据集、睡眠活动的脑电图记录,以及智能手机和智能手表的活动和生物特征数据集。

1.3 In-Context Learning
1.3 上下文学习

Large language models (LLMs) have recently demonstrated promising capabilities (Kojima et al., 2022; Huang et al., 2022; Ruis et al., 2022; Weng et al., 2023; Fang et al., 2024). State-of-the-art models typically exceed 10 billion parameters and exhibit basic reasoning capabilities through in-context learning (Brown et al., 2020), as seen with Chain-of-Thought (Wei et al., 2022) and Tree-of-Thought prompting (Yao et al., 2023).
大型语言模型(LLMs)最近展示了令人鼓舞的能力(Kojima et al., 2022; Huang et al., 2022; Ruis et al., 2022; Weng et al., 2023; Fang et al., 2024)。最先进的模型通常超过100亿个参数,并通过上下文学习(Brown et al., 2020)展现出基本的推理能力,如链式思维(Wei et al., 2022)和树状思维提示(Yao et al., 2023)所示。
In-context learning is an emergent phenomenon in LLMs that occurs after extensive pretraining on vast datasets. The models have been trained on countless domains and tasks, and their in-context learning capabilities enable them to generalize across various tasks and domains. Unlike traditional learning paradigms that rely on gradient-based updates to adjust model parameters, in-context learning leverages the model's ability to adapt to specific tasks without changing its underlying weights. We illustrate this with the following example.
上下文学习是LLMs中的一种新兴现象,它发生在对大量数据集进行广泛预训练之后。这些模型已经在无数领域和任务上进行了训练,其上下文学习能力使它们能够在各种任务和领域之间进行泛化。与依赖于基于梯度的更新来调整模型参数的传统学习范式不同,上下文学习利用模型在不改变其基础权重的情况下适应特定任务的能力。我们用以下示例来说明这一点。
Example 1. Prompt-based Instructions:
示例 1. 基于提示的指令:
A typical in-context learning prompt is constructed based on an instruction, e.g., "Perform a fuzzy comparison between the following statements and determine if they are approximately equal or not:", some contextual patterns as few-shot examples:
一个典型的上下文学习提示是基于指令构建的,例如:“对以下陈述进行模糊比较,并确定它们是否大致相等:”,以及一些作为少量示例的上下文模式:

1 1 = "ONE" => True,
6. 0=SIX=> True,
1 = "two" => False,
"five" = 5 => True,
ten =10 True,
4 = "Three" => False,
"four" = "FOUR" => True,

and the new query statement which we want the LLM to complete: 1 "eight" = 8 =>
以及我们希望LLM完成的新查询语句:1 “eight” = 8 =>
The LLM leverages the provided examples to create associations between numbers and words, allowing it to interpret the fuzzy comparison between the presented values and extrapolate to similar examples. This enables the LLM to infer that "eight" and "8" are approximately equal in this context,leading it to predict "True" as the correct outcome for the new query statement.
LLM利用提供的示例在数字和单词之间建立关联,使其能够解释所呈现值之间的模糊比较,并推断出类似示例。这使得LLM能够推断在此上下文中“eight”和“8”大致相等,从而预测“True”作为新查询语句的正确结果。
Large Language Models To begin our analysis, we first consider a simple prompt: "Hello, my world!", and its possible tokenization:
大型语言模型 为了开始我们的分析,我们首先考虑一个简单的提示:“你好,我的世界!”,以及它可能的标记化:
["He", "l", "l", "o", ", ", "my", "w", "o", "r", "ld", "l"]
The vocabulary V might then include these tokens and many others:
词汇表 V 可能包括这些标记和许多其他标记:
V={"He","l","o","w","r","ld","l","l",",","my","us","of",...}
This example illustrates how a sentence is converted into tokens, and how the vocabulary contains whole words, subword units and characters.
这个例子说明了一个句子是如何被转换为标记的,以及词汇表是如何包含完整单词、子词单元和字符的。
More formally,let V be a finite set representing the vocabulary of the language model. We then define V as the set of all possible token sequences (including the empty sequence) of elements from the finite set of tokens V . We call an element xXV a sequences of tokens from the input space and an element yYV a sequence of tokens from the output space.
更正式地说,设 V 为表示语言模型词汇的有限集合。我们定义 V 为来自有限标记集合 V 的所有可能标记序列(包括空序列)的集合。我们称元素 xXV 为来自输入空间的标记序列,元素 yYV 为来自输出空间的标记序列。
For our analysis,we first define the subspace PXV as the space of all meaningful prompts and subspaces IX,CX×Y ,and SX representing the instruction space,context space,and query space, respectively. We then define the prompt input xP=I×C×S as a tuple x=(i,c,s) ,where iI,cC ,and sS . Now,let f:PY represent the LLM which predicts outputs with desired targets y=f(x)Y .
在我们的分析中,我们首先定义子空间 PXV 为所有有意义提示的空间,以及子空间 IX,CX×YSX 分别表示指令空间、上下文空间和查询空间。然后,我们定义提示输入 xP=I×C×S 为一个元组 x=(i,c,s),其中 iI,cCsS。现在,设 f:PY 表示 LLM,它预测具有期望目标的输出 y=f(x)Y
In-Context Learning and Domain-Invariant Learning In the context of domain-invariant learning,we consider the prompt domain P ,where samples are drawn from two probability distributions over this domain: a source distribution p(i,c,s) and a target distribution q(i,c,s) . The basic intuition is that an LLM f learns representations that are invariant across different contexts, enabling generalization from source samples to target samples. In-context learning then minimizes the expected target risk Rq(f) not by explicitly updating the model parameters, but through implicit adaptation based on the provided context examples, which is often referred to as prompt engineering (Nori et al., 2023).
上下文学习与领域不变学习 在领域不变学习的背景下,我们考虑提示领域 P,其中样本来自该领域的两个概率分布:源分布 p(i,c,s) 和目标分布 q(i,c,s)。基本直觉是,LLM f 学习在不同上下文中不变的表示,从而实现从源样本到目标样本的泛化。上下文学习通过基于提供的上下文示例的隐式适应来最小化期望目标风险 Rq(f),而不是通过显式更新模型参数,这通常被称为提示工程(Nori et al., 2023)。
The in-context learning approach can then be summarized as follows:
上下文学习方法可以总结如下:
  1. Encoding: Given an embedding function ρ:VRd with d1 we encode the instructions iI ,context cC ,and query statement sS into vector representations:
  1. 编码:给定一个嵌入函数 ρ:VRdd1,我们将指令 iI、上下文 cC 和查询语句 sS 编码为向量表示:
i1:m=(ρ(i1),ρ(i2),,ρ(im))Rm×d=(ρ(ij))j=1m,
c1:k=(ρ(c1),ρ(c2),,ρ(ck))Rk×d=(ρ(cj))j=1k,
s1:n=(ρ(s1),ρ(s2),,ρ(sn))Rn×d=(ρ(sj))j=1n.
  1. Concatenation: The embedded vectors are concatenated into a single sequence (with lengths k1 and k2 of the two context sequences):
  1. 连接:嵌入向量被连接成一个单一的序列(两个上下文序列的长度为 k1k2):
z1:j=m+k1+k2+n1=[i1:m,c1:k1+k2,s1:n]=(zi1)i=1j=m+k1+k2+n.
  1. Transformations: Transformation functions φtrans l (e.g.,attention mechanisms, feed-forward neural networks, normalization, etc.) are applied across different hidden layers with 1<l<L ,such that the latent sequences z1:jl1 produces the subsequent latent representation z1:jl with latent dimension dl :
  1. 转换:转换函数 φtrans l(例如,注意机制、前馈神经网络、归一化等)在不同的隐藏层上应用,具有 1<l<L,使得潜在序列 z1:jl1 产生后续的潜在表示 z1:jl,其潜在维度为 dl
z1:jl=φtrans l(z1:jl1)Rj×dl.
  1. Decoding: The last latent representation z1:jLRdL is usually aggregating the sequences before it is then decoded to obtain the output
  1. 解码:最后的潜在表示 z1:jLRdL 通常在被解码以获得输出之前聚合其之前的序列
yV:
y=φdec(z1:jL)V
where φdec :RdLV maps from the latent space to a token in the output space.
其中 φdec :RdLV 从潜在空间映射到输出空间中的一个标记。
  1. Auto-regression: In this step, the prediction of the decoder is usually appended to the input, and the process repeats from steps 1 to 4 until halting criteria, such as a stop token, are encountered. This auto-regressive process generates the predicted sequence y^V .
  1. 自回归:在此步骤中,解码器的预测通常会附加到输入上,并且该过程从步骤 1 到 4 重复,直到遇到停止标准,例如停止标记。这个自回归过程生成预测序列 y^V
We view in-context learning as deeply rooted in DA and related to domain-invariant learning, since LLMs seem to learn shared feature representations φtrans l() that generalize across diverse tasks and domains without explicit retraining. This is because LLMs are trained on a large corpus of data that includes various prompt-to-prediction pairs from different domains. This training enables an LLMs to form domain-invariant representations that generalize across different tasks, even to those they have not specifically been trained for. In our paper Dinu et al. (2024a), we accredit this to the associations formed between different domains during training based on the instructions and contextual sequences for problem-specific tasks.
我们将上下文学习视为深深植根于数据增强(DA)并与领域不变学习相关,因为大型语言模型(LLMs)似乎学习了共享特征表示 φtrans l(),这些表示能够在不同任务和领域之间进行泛化,而无需明确的再训练。这是因为 LLMs 在一个包含来自不同领域的各种提示到预测对的大型数据集上进行训练。这种训练使得 LLMs 能够形成领域不变的表示,这些表示能够在不同任务之间进行泛化,即使是那些它们没有专门训练过的任务。在我们的论文 Dinu 等(2024a)中,我们将此归因于在训练过程中基于问题特定任务的指令和上下文序列形成的不同领域之间的关联。
We draw this conclusion based on the evidence of analyzing the memorization and learning capabilities of LLMs (Carlini et al., 2021; Min et al., 2022; Hartmann et al., 2023; Oswald et al., 2023; Duan et al., 2024), polysemanticity and capacity in neural networks (Scherlis et al., 2022; Zhao et al., 2024), and semantic feature spaces (Wolfram, 2023). To illustrate, Figure 1.2 presents a UMAP projection (McInnes et al., 2018) of the latent space, visualizing the semantic relationships between four different domains based on their embedded representations in the LLM. We show the embedding activations of the last layer of the open-source GPT-Neo 1.3 billion parameter model (Black et al., 2021) for the respective domains. Here, domain represents a specific data distribution of token sequences. These domains are denoted as Mathematical, Programming, Natural Language, and Random domain. Each dataset includes examples representative of the data distribution of token sequences for the respective domains.
我们基于对 LLM(Carlini et al., 2021; Min et al., 2022; Hartmann et al., 2023; Oswald et al., 2023; Duan et al., 2024)记忆和学习能力的分析证据、神经网络中的多义性和容量(Scherlis et al., 2022; Zhao et al., 2024)以及语义特征空间(Wolfram, 2023)得出这一结论。为此,图 1.2 展示了潜在空间的 UMAP 投影(McInnes et al., 2018),可视化了基于 LLM 中嵌入表示的四个不同领域之间的语义关系。我们展示了开源 GPT-Neo 13 亿参数模型(Black et al., 2021)最后一层的嵌入激活,针对各自领域。这里,领域代表特定的数据分布的标记序列。这些领域被称为数学、编程、自然语言和随机领域。每个数据集包括代表各自领域标记序列数据分布的示例。
However, we also see an overlap between the first three domains, since they have semantically related features,such as " a=b " and " a equals b ". This is possible because during training, the model developed a semantic mapping that captures cross-domain correspondences and contextual synonyms. In-context learning prompts then interpolate between these feature representations by establishing new in-context associations between sequences and, consequently, between data distributions, thereby enforcing domain-invariant predictions. The power of in-context learning lies in its ability to generalize representations across various tasks and domains by leveraging semantically related features for context-specific adaptations of data distributions.
然而,我们也看到前三个领域之间存在重叠,因为它们具有语义相关的特征,例如“ a=b ”和“ a 等于 b ”。这是可能的,因为在训练过程中,模型发展了一种语义映射,捕捉跨领域的对应关系和上下文同义词。上下文学习提示通过在序列之间建立新的上下文关联,从而在这些特征表示之间进行插值,因此在数据分布之间强制执行领域不变的预测。上下文学习的力量在于其能够通过利用语义相关特征在各种任务和领域之间推广表示,以实现数据分布的上下文特定适应。
Structure of In-Context Learning for Domain Adaptation LLMs are usually trained on associated pairs with instructions iI ,labeled contexts cC sampled w.r.t. to their marginal source and target distribution respectively,and unlabeled query statements sS . The contexts include expected labeled outcomes yY that create a bridge between domains, and help the model understand relationships between sequences of tokens. Specifically, we denote the following elements:
上下文学习的结构用于领域适应,LLMs 通常在带有指令的关联对上进行训练 iI,标记的上下文 cC 根据它们的边际源和目标分布分别进行采样,以及未标记的查询语句 sS。这些上下文包括期望的标记结果 yY,它们在领域之间建立桥梁,并帮助模型理解标记序列之间的关系。具体而言,我们表示以下元素:
  • i=(iu)u=1m denotes the instruction coming from the marginal distribution pI ,where each iu represents a token in the instruction sequence.
  • i=(iu)u=1m 表示来自边际分布的指令 pI,其中每个 iu 代表指令序列中的一个标记。
Figure 1.2: UMAP projection from the latent embeddings of four domains before the classification layer of a GPT-Neo 1.3 billion parameters LLM. The domains are Mathematical, Programming, Natural Language and Random. Each domain includes 40 samples in their respective domain-specific formulation. We see semantically similar concepts cluster together across domains, highlighting the overlap between domains. The Random domain is furthest apart, since the embedded sequences do not share semantic overlap with the other domains.
图 1.2:在 GPT-Neo 13 亿参数 LLM 的分类层之前,四个领域的潜在嵌入的 UMAP 投影。这些领域包括数学、编程、自然语言和随机。每个领域在其各自的领域特定表述中包含 40 个样本。我们看到语义上相似的概念在不同领域中聚集在一起,突显了领域之间的重叠。随机领域的距离最远,因为嵌入序列与其他领域没有共享的语义重叠。
  • Cp={cpv=[(cv,t)t=1lv1,(yv,w)w=1hv1]}v=1k1 represents source domain examples,where cv,t is the t -th token of the v -th context sequence of length lv1 ,and yv,w is the w -th token of the corresponding label sequence of length hv1 ,coming from the marginal distribution pC .
  • Cp={cpv=[(cv,t)t=1lv1,(yv,w)w=1hv1]}v=1k1 表示源领域示例,其中 cv,t 是长度为 lv1v -th 上下文序列的 t -th 标记,而 yv,w 是长度为 hv1 的相应标签序列的 w -th 标记,来自边际分布 pC
  • Cq={cqv=[(cv,t)t=1lv2,(yv,w)w=1hv2]}v=1k2 represents target domain examples,where cv,t is the t -th token of the v -th context sequence of length lv2 ,and yv,w is the w -th token of the corresponding label sequence of length hv2 ,coming from the marginal distribution qC .
  • Cq={cqv=[(cv,t)t=1lv2,(yv,w)w=1hv2]}v=1k2 表示目标领域示例,其中 cv,t 是长度为 lv2v -th 上下文序列的 t -th 标记,而 yv,w 是长度为 hv2 的相应标签序列的 w -th 标记,来自边际分布 qC
  • s=(sj)j=1n is an unlabeled query statement coming from the marginal distribution qS ,where each sj represents a token in the query sequence.
  • s=(sj)j=1n 是来自边际分布 qS 的未标记查询语句,其中每个 sj 代表查询序列中的一个标记。
For example, in a sentiment analysis task across different product categories, i might be "Classify the sentiment as positive or negative:", cp could include labeled examples like ("This book was a page-turner!", "very positive") and ("The plot was predictable", "somewhat negative") from the book domain, cq could include ("The battery life is impressive","extremely positive") and ("The screen cracks easily", "very negative") from the electronics domain,and s could be an unlabeled review "My new book arrived with a crack on its cover" to classify.
例如,在不同产品类别的情感分析任务中,i 可能是“将情感分类为积极或消极:”, cp 可以包括来自书籍领域的标记示例,如(“这本书让人爱不释手!”,“非常积极”)和(“情节可预测”,“有些消极”), cq 可以包括来自电子产品领域的(“电池续航令人印象深刻”,“极其积极”)和(“屏幕容易破裂”,“非常消极”),而 s 可能是未标记的评论“我的新书到达时封面上有裂痕”需要进行分类。
We then formulate the input to the model as z1=φenc (i,cp,cq,s;ρ) , where φenc :Im×Ckp×Ckq×SnRj×d represents the encoding and concatenation function based on the embedding model ρ (see in-context learning approach step 1. and 2.),with kp and kq the respective lengths for the context variables and j the total sequence length. The function φenc  utilizes two domains,and outputs a latent embedding for the input of subsequent layers φtrans l . We omit the sequence length indices for brevity of notation.
然后我们将输入模型的格式化为 z1=φenc (i,cp,cq,s;ρ) ,其中 φenc :Im×Ckp×Ckq×SnRj×d 表示基于嵌入模型 ρ 的编码和连接函数(见上下文学习方法步骤1和2), kpkq 分别是上下文变量的长度, j 是总序列长度。函数 φenc  利用两个领域,并为后续层的输入输出潜在嵌入 φtrans l 。为了简化符号,我们省略了序列长度索引。
Problem Statement The aim is to form non-trivial in-context associations that enable the model to predict y^=yY ,and formulate in-context learning with the principles of UDA that leverages few labeled data pairs from both domains to learn associations and generalize to unlabeled query statements.
问题陈述 目标是形成非平凡的上下文关联,使模型能够预测 y^=yY ,并结合利用少量标记数据对来自两个领域的 UDA 原则来制定上下文学习,以学习关联并推广到未标记的查询语句。
By learning from these in-context associations during training, the LLM is conditioned to associate concepts that are related, and therefore can learn domain-invariant latent representations. This process aims to minimize the discrepancy between distributions of source and target components in the latent space. Specifically,for any layer index l ,the transformation φtrans l should map the source component zpl1 and the target component zql1 of zl1 to similar regions in the latent space,such that φtrans l(zpl1)φtrans l(zql1) .
通过在训练过程中学习这些上下文关联,LLM 被条件化为关联相关概念,因此可以学习领域不变的潜在表示。该过程旨在最小化潜在空间中源组件和目标组件分布之间的差异。具体而言,对于任何层索引 l ,变换 φtrans l 应该将 zl1 的源组件 zpl1 和目标组件 zql1 映射到潜在空间中的相似区域,以便 φtrans l(zpl1)φtrans l(zql1)
We obtain zp1 by applying the encoding function to the source context sequences only:
我们通过仅对源上下文序列应用编码函数来获得 zp1
(1.3)zp1=φenc (0,cp,0,0;ρ)
where0represents a placeholder (such as a zero vector or a special masking token) used in place of the instruction, target context, and query components. Similarly, for the target component:
其中 0 表示一个占位符(例如零向量或特殊掩码标记),用于替代指令、目标上下文和查询组件。类似地,对于目标组件:
(1.4)zq1=φenc (0,0,cq,0;ρ)
This formulation explicitly shows how we isolate the source and target contexts in the encoding process, allowing us to analyze their representations separately in the subsequent transformations.
该公式明确显示了我们如何在编码过程中隔离源和目标上下文,从而使我们能够在后续变换中分别分析它们的表示。
Implicit Contextual Associations Such associations are often formed implicitly through token sequences that have specific properties attached to them, i.e., the "equals" statement is related to "=" or "the same" which allows for substitutions, and are innate to many domains such as mathematics or programming. Therefore, the models profit from many such contextual examples, which also enables us to evaluate unconventional queries as shown in the following observations.
隐式上下文关联 这样的关联通常通过具有特定属性的标记序列隐式形成,即“等于”语句与“=”或“相同”相关,这允许进行替换,并且在许多领域(如数学或编程)中是固有的。因此,模型从许多这样的上下文示例中受益,这也使我们能够评估非常规查询,如以下观察所示。
Let us examine several examples to illustrate the capabilities of two state-of-the-art language models, Claude 3.5 Sonnet and GPT-4 (OpenAI, 2023; Anthropic, 2024), both of which were trained using in-context learning.
让我们检查几个示例,以说明两种最先进的语言模型的能力,Claude 3.5 Sonnet 和 GPT-4(OpenAI,2023;Anthropic,2024),这两者都是使用上下文学习进行训练的。
Example 2. Consider the following arithmetic query:
示例 2. 考虑以下算术查询:
θ+22θ=
This query, when interpreted by a model, returns a similar prediction as shown below:
当模型解释此查询时,返回的预测与下面所示的相似:

1 Let's solve this step-by-step, following the order of
operations (PEMDAS: Parentheses, Exponents, Multiplication/
Division, Addition/Subtraction):
2
3 First, let's identify the numbers:
49=9
52=2
6⑤ =5
7 Now, our equation looks like this:
89+225=
9 We start with multiplication:
1025=10
11 Now our equation is:
129+210=
13 We perform addition and subtraction from left to right:
149+2=11
15 11 - 10 = 1
16
17 Therefore:
189+225=1
19 The final answer is 1 .

The model predicts the answer 1, which is semantically correct for a human reader. However, it is interesting to observe that the model interprets the symbols Q,Q ,⑤,and+as numbers9,2,5,and the + operator,respectively. These symbols are entirely different Unicode encodings and are mapped to different embeddings, yet the model successfully associates them with their corresponding mathematical representations.
该模型预测答案为1,这对于人类读者来说是语义上正确的。然而,有趣的是,模型将符号 Q,Q、⑤ 和 + 分别解释为数字 9、2、5 和 + 运算符。这些符号是完全不同的 Unicode 编码,并映射到不同的嵌入中,但模型成功地将它们与相应的数学表示相关联。
Let us now explore how this capability extends to other symbols.
现在让我们探讨这种能力如何扩展到其他符号。
Example 3. Consider an operation between two weather symbols: 1 ₩ + nm =
示例 3. 考虑两个天气符号之间的运算:1 ₩ + nm =
When evaluated, the models return a similar prediction as follows:
在评估时,模型返回类似的预测如下:
1 分 + nm = M
2 The combination of a sun symbol ( M~ ) and a rain symbol ( m~ ) typically represents a rainbow (M). This is because rainbows occur in nature when sunlight and rain are present simultaneously ... In this case,not only are the symbols A,M ,and associated with the words sun, rain, and rainbow, respectively, but we also observe that the + and = operators follow the same semantics as in the arithmetic example above. This showcases a model's ability to blend different domains - in this case, weather phenomena and basic arithmetic operations - to produce a meaningful interpretation.
2 太阳符号 (M~) 和雨符号 (m~) 的组合通常表示彩虹 (M)。这是因为彩虹在自然界中发生时,阳光和雨水同时存在……在这种情况下,不仅符号 A,M 分别与单词太阳、雨和彩虹相关联,而且我们还观察到 + 和 = 运算符遵循与上述算术示例相同的语义。这展示了模型将不同领域(在这种情况下是天气现象和基本算术运算)融合在一起以产生有意义解释的能力。
However, it is important to note that not every query is parsable, as demonstrated in the following example.
然而,重要的是要注意,并非每个查询都是可解析的,以下示例证明了这一点。
Example 4. Consider a query using the Braille alphabet with numbers and the + operator:
示例 4. 考虑使用盲文字母、数字和 + 运算符的查询:
11+=
In this case, the Claude model struggles to associate the number 1 with the two dots to infer a semantic operation and predict a number:
在这种情况下,Claude 模型难以将数字 1 与两个点关联,以推断语义运算并预测一个数字:

1 To solve this,we need to understand what the symbol .
represents.
2 This symbol is a Braille character, but its numerical value isn
't universally defined in the same way as the earlier emoji
numbers were. Without additional context about how ...

This example illustrates the limitations of the model's ability to make associations across different symbol systems, especially when the relationship between the symbols is not explicitly defined or commonly used.
这个例子说明了模型在不同符号系统之间进行关联的能力的局限性,特别是当符号之间的关系没有明确定义或不常用时。
Now, to showcase the capabilities of in-context learning, we will add two examples to help the model understand how to interpret the query.
现在,为了展示上下文学习的能力,我们将添加两个示例,以帮助模型理解如何解释查询。
Example 5. Consider the adapted query using Braille alphabet with numbers and the + operator, and two added in-context examples:
示例 5. 考虑使用带有数字和 + 运算符的盲文字母的适应查询,以及两个添加的上下文示例:
12+++3
21+=5
41+
In this case, both models interpret the context and associate the Unicode encoding with the in-context examples to deduce the correct answer:
在这种情况下,两个模型都解释了上下文,并将 Unicode 编码与上下文示例关联,以推导出正确答案:
1 Let's approach this step-by-step:
1 让我们一步一步来:
2
3 From the first equation, we can deduce:
3 从第一个方程中,我们可以推导出:
42+++3
532=32=1

6 From the second equation:
71+=5
8=51=4
9 Now, we need to solve:
101+=?
11 We can see a pattern forming. The Braille symbols seem to
represent numbers:
12 * represents 1
13 - would likely represent 2
14 * represents 4
15 Therefore,the equation 1+ would be equivalent to 1+2
16
17 So, the answer is:

181+=3
19 Would you like me to explain any part of this reasoning in more
19 你想让我更详细地解释这个推理的任何部分吗?
detail?
详细说明?
Note that the ': symbol represents the number 4,not 3,since it is the fourth entry in the encoding table of the Braille sequence.
请注意,':' 符号表示数字 4,而不是 3,因为它是盲文序列编码表中的第四个条目。
The last example shows that, with only two provided examples, the models are able to associate the domains of numbers with the Braille alphabet and compute the expected prediction (even though the association for the number 4 for a human reader is non-intuitive).
最后一个示例表明,仅凭两个提供的示例,模型能够将数字领域与盲文字母关联,并计算出预期的预测(尽管对于人类读者来说,数字 4 的关联并不直观)。
In practice, these associations are often more subtle, since they occur in permutations of prompts X×Y that have no explicitly separated instruction, context, query statement and label declarations. To illustrate this concept of domain blending within in-context learning, let's examine a final, more complex example.
在实践中,这些关联往往更微妙,因为它们发生在没有明确分隔的指令、上下文、查询语句和标签声明的提示 X×Y 的排列中。为了说明在上下文学习中领域融合的概念,让我们考察一个最终的、更复杂的示例。
Example 6. We present the models with a prompt that combines two disparate domains: Python programming and Shakespearean literature. The prompt reads: "Write a simple Python function to check if a file exists in Shakespeare style". This unusual combination challenges the model to seamlessly integrate knowledge from both disjoint domains.
示例 6. 我们向模型提供一个结合了两个不同领域的提示:Python 编程和莎士比亚文学。提示内容为:“用莎士比亚风格编写一个简单的 Python 函数,以检查文件是否存在”。这种不寻常的组合挑战模型无缝整合来自两个不相交领域的知识。
The resulting prediction demonstrates the model's ability to interpret novel in-context associations without additional context samples:
生成的预测展示了模型在没有额外上下文样本的情况下解释新颖上下文关联的能力:


import os
def doth_file_exist (file_path):
mn
To exist, or not to exist, that is the question:
Whether 'tis nobler in the mind to suffer



. . .
And lose the name of action.--Soft you now,
The fair file!--Nymph, in thy orisons
Be all my sins remember d .
Args:
file_path (str): The path whose existence is to be verified.
Returns:
bool: True if the file exists, False otherwise.
πππ
return os.path.exists (file_path)
file_path = "path/to/your/file.txt"
if doth_file_exist(file_path):
print("Verily,_the_file_exists!")
else:
print ("Alas, U theUfileUdoesUnotUexist. ")

The model predicts tokens from both the Python programming and Shakespearean literature distributions, showcasing the model's ability to blend seemingly disparate domains, even though such combinations are highly unlikely in real-world scenarios.
该模型预测来自 Python 编程和莎士比亚文学分布的标记,展示了模型将看似不相关领域融合的能力,尽管在现实场景中这种组合是极不可能的。
In the last example, the instruction is defined within the sequence of Write a ... in ... style. The source and target domain context is present in an abstract form as Python programming and Shakespeare respectively. The query statement is implicitly provided during inference, as LLMs typically employ special tokens to delineate user prompts from model-generated responses. Although, the models learned during training to represent both domains independently, due to semantic overlap between tokens, at inference both models are able to interpolate between the domain representations and predict output sequences coming from the joint distribution.
在最后一个例子中,指令在“以...风格写...”的序列中定义。源域和目标域的上下文以抽象形式存在,分别为 Python 编程和莎士比亚。查询语句在推理过程中隐含提供,因为 LLM 通常使用特殊标记来区分用户提示和模型生成的响应。尽管模型在训练期间学习独立表示这两个领域,但由于标记之间的语义重叠,在推理时两个模型能够在领域表示之间进行插值,并预测来自联合分布的输出序列。
In-Context Associations and Attention To understand the latent representations φtrans l() in an LLM and how they could form domain-invariant features, we also examine the token frequency and embedding distributions, and the self-attention mechanism of modern Transformer architectures based on the GPT-Neo model. In Figure 1.3 we show the token frequency per domain and the embeddings after the encoding phase and before the preclassification latent layer. The first row shows the absolute frequency of tokens per domain, with the y-axis normalized by the 99-percentile of the maximum value obtained across all domains. The x-axis shows only the used unique token IDs across all four domains, concatenated in order, to enable comparison of the token frequencies. The second and third row show the normalized embedding distribution for the encoded tokens after the tokenization and encoding phase and the latent embeddings before the classification layer, respectively. Despite the initial differences in token frequencies and distributions in the lower layers, the normalization and transformation processes within the LLM architecture result in centered values (around zero) and aligned embedding distributions across domains, setting the premise for achieving domain-invariant representations as illustrated in the third row.
上下文关联与注意力 为了理解 LLM 中的潜在表示 φtrans l() 以及它们如何形成领域不变特征,我们还考察了标记频率和嵌入分布,以及基于 GPT-Neo 模型的现代 Transformer 架构的自注意力机制。在图 1.3 中,我们展示了每个领域的标记频率,以及在编码阶段后和预分类潜在层之前的嵌入。第一行显示了每个领域的标记绝对频率,y 轴经过归一化,基于所有领域中获得的最大值的 99 百分位数。x 轴仅显示在所有四个领域中使用的唯一标记 ID,按顺序连接,以便比较标记频率。第二行和第三行分别显示了经过标记化和编码阶段后的编码标记的归一化嵌入分布,以及分类层之前的潜在嵌入。尽管在较低层中标记频率和分布存在初始差异,但 LLM 架构中的归一化和变换过程导致值集中(接近零)并且跨领域的嵌入分布对齐,为实现领域不变表示奠定了基础,如第三行所示。
Figure 1.3: We show the distribution of the tokens across four domains, namely Mathematical, Programming, Natural Language and Random domain. The first row shows on the y-axis the 99-percentile normalized frequency of tokens. The x-axis shows the used unique tokens across all four domains, concatenated in order. The second and third row show the normalized embedding distribution for the encoded tokens after the tokenization and encoding phase and the latent embeddings before the classification layer, respectively.
图 1.3:我们展示了四个领域的标记分布,即数学、编程、自然语言和随机领域。第一行在 y 轴上显示了标记的 99 百分位归一化频率。x 轴显示了在所有四个领域中使用的唯一标记,按顺序连接。第二行和第三行分别显示了在标记化和编码阶段后编码标记的归一化嵌入分布,以及分类层之前的潜在嵌入分布。
The self-attention mechanism is an associative memory (Ramsauer et al., 2020) that computes a scaled dot-product between query(Q)(unrelated to the previously mentioned query statement of the input sequence), key (K), and value(V)matrices. These matrices are linear projections of the input sequence represented as a matrix zRj×d ,where j is the sequence length and d is the embedding dimension:
自注意力机制是一种关联记忆(Ramsauer 等,2020),它计算查询(Q)(与输入序列的先前提到的查询语句无关)、键(K)和值(V)矩阵之间的缩放点积。这些矩阵是输入序列的线性投影,表示为矩阵 zRj×d,其中 j 是序列长度,d 是嵌入维度:
Q=zWQ,WQRd×dk,
K=zWK,WKRd×dk,
(1.5)V=zWV,WVRd×dv,
where WQ,WK ,and WV are learnable weight matrices for the queries, keys,and values,respectively. The matrix multiplication zW projects each token embedding in z to a new space,creating the corresponding Q,K , or V representation. Following Vaswani et al. (2017),we typically set dk=dv=d/h=64 ,where dk is the dimensionality of the key vectors,and h=8 is the number of attention heads in a multi-head attention setting.
其中 WQ,WKWV 是可学习的权重矩阵,分别用于查询、键和值。矩阵乘法 zWz 中的每个标记嵌入投影到一个新空间,创建相应的 Q,KV 表示。遵循 Vaswani 等(2017),我们通常设置 dk=dv=d/h=64,其中 dk 是键向量的维度,h=8 是多头注意力设置中的注意力头数量。
The scaled dot-product is commonly known as attention scores, which we refer to as our transformation function φtrans () ,and is computed as follows:
缩放点积通常被称为注意力分数,我们将其称为我们的变换函数 φtrans (),计算如下:
(1.6)φtrans (z):=Attention(Q,K,V)=softmax(QKdk)V,
where a scaling factor dk is introduced to mitigate the effect of large dot products in high-dimensional spaces. This scaling helps maintain stable gradients during training,particularly for larger values of dk ,and prevents the softmax function from producing small gradients in regions where the input is large in magnitude.
其中引入了一个缩放因子 dk,以减轻高维空间中大点积的影响。这个缩放有助于在训练过程中保持稳定的梯度,特别是对于较大的 dk 值,并防止 softmax 函数在输入幅度较大时产生小梯度。
The self-attention function computes a weighted sum of the rows of the value matrix V (each row being a value vector),with weights derived from the similarity between corresponding rows of the query matrix Q and columns of the key matrix K . Here,self-attention refers to the fact that Q,K,V all share the same sequence matrix z . It’s worth noting that in other attention settings, different sequence representations can be used for each component. The resulting attention output φtrans () associates information from different parts of the input sequence, capturing the dependencies and relationships between tokens. For networks with multiple layered transformations φtrans l>1 the output is usually normalized before passed on to subsequent layers. The respective matrices WQl,WKl,WVl learn these associations for domain-specific tasks and generalize to the semantics based on the instructions and contextual patterns. This generalization mechanism then later allows the model to extrapolate semantics across diverse domains when provided with few-shot examples, effectively enabling DA through the formation of context-dependent associations in the attention layers.
自注意力函数计算值矩阵 V 行的加权和(每行是一个值向量),权重来源于查询矩阵 Q 相应行与键矩阵 K 列之间的相似性。在这里,自注意力指的是 Q,K,V 所有共享相同的序列矩阵 z。值得注意的是,在其他注意力设置中,可以为每个组件使用不同的序列表示。生成的注意力输出 φtrans () 关联了输入序列不同部分的信息,捕捉了标记之间的依赖关系和关系。对于具有多层变换的网络 φtrans l>1,输出通常在传递到后续层之前进行归一化。各自的矩阵 WQl,WKl,WVl 学习这些关联以应对特定领域的任务,并根据指令和上下文模式进行语义泛化。这个泛化机制随后使模型能够在提供少量示例时跨不同领域推断语义,有效地通过在注意力层中形成上下文相关的关联来实现领域适应(DA)。
Domain-Invariance in Language Models In the previous sections, we have seen LLMs able of forming domain-invariant predictions based on our empirical observations, and accredited this to associative patterns learned during training. We now conclude our analysis by defining domain-invariance in an LLM setting, and propose a measurable relationship between semantic similarity of domains and the convergence of their embedded representations. This relationship forms the basis for future work on quantifying the models' capabilities for cross-domain generalization.
语言模型中的领域不变性 在前面的部分中,我们已经看到大型语言模型(LLMs)能够基于我们的实证观察形成领域不变的预测,并将其归因于训练过程中学习的关联模式。我们现在通过在LLM环境中定义领域不变性来结束我们的分析,并提出领域语义相似性与其嵌入表示收敛之间的可测关系。这一关系为未来量化模型跨领域泛化能力的研究奠定了基础。
We note that in a general case an LLM is a function that maps fθ:PV with parameters θΘ ,which produces a sequence of outputs based on the prompt. This is due to the auto-regressive setting of predicting and sampling the next token and appending it to the input, which is repeated until the halting criteria such as a stop token terminates the sequence. In this setting, a decoder typically samples according to some sampling strategy (i.e. using a temperature parameter, beam search, etc.) to obtain the most likely sequence of token. However, for simplicity the following definition will focus only on a special case where we omit the auto-regressive process and assume that the context CX×V considers the output space equal to the vocabulary,and the LLM predicts only the next token V . Furthermore,to define the domain invariance as shown in Example 1, we assume that "True" and "False" are tokens and part of the vocabulary and then formalize the definition as follows:
我们注意到,在一般情况下,LLM 是一个将 fθ:PV 映射到具有参数 θΘ 的函数,它根据提示生成一系列输出。这是由于自回归设置的原因,预测和采样下一个标记,并将其附加到输入中,直到停止标记等停止标准终止序列。在这种设置中,解码器通常根据某种采样策略(即使用温度参数、束搜索等)进行采样,以获得最可能的标记序列。然而,为了简化,以下定义将仅关注一个特殊情况,即我们省略自回归过程,并假设上下文 CX×V 将输出空间视为等于词汇表,并且 LLM 仅预测下一个标记 V。此外,为了定义如示例 1 所示的领域不变性,我们假设“真”和“假”是标记并且是词汇表的一部分,然后将定义形式化如下:
Let V be the vocabulary of tokens of an LLM and PXV be the prompt space. Define P=Im×Ckp×Ckq×Sn with instruction space iIX ,context space cCX×V ,and query space sSX . Let p(i,c,s) and q(i,c,s) denote the joint distributions for the source and target domains respectively,with pC and qC being their respective marginal distributions over the context space. The sequence dimensions m,kp,kq,n are each 1 ,the sequence length is j=m+kp+kq+n ,and the first latent dimension is d11 .
V 为 LLM 的标记词汇表, PXV 为提示空间。定义 P=Im×Ckp×Ckq×Sn 具有指令空间 iIX、上下文空间 cCX×V 和查询空间 sSX。设 p(i,c,s)q(i,c,s) 分别表示源域和目标域的联合分布, pCqC 为它们在上下文空间上的各自边际分布。序列维度 m,kp,kq,n 每个为 1,序列长度为 j=m+kp+kq+n,第一个潜在维度为 d11
Now,let an LLM be a function fθ:PV with parameters θΘ , with transformation layers φtrans l:Rj×dl1Rj×dl and dimensions dl for 1<l<L ,and the last transformation layer φtrans L:Rj×dL1RdL before the decoder φdec :RdLV ,where the output dimension dL=|V|1 is equal to the vocabulary size.
现在,让一个 LLM 成为一个函数 fθ:PV,具有参数 θΘ,具有变换层 φtrans l:Rj×dl1Rj×dl 和维度 dl,用于 1<l<L,并且在解码器 φdec :RdLV 之前的最后一个变换层 φtrans L:Rj×dL1RdL,其中输出维度 dL=|V|1 等于词汇表大小。
Definition 1 (Domain-Invariant Language Models). We say an LLM has domain-invariant properties if, for all pairs of contextual samples with equivalent semantics,represented by cpcq ,where cpCp comes from the marginal distribution pC and the semantically equivalent cqCq comes from the marginal distribution qC ,the argmax (greedy sampling decoder) of their outputs from the last transformation layer φtrans L is equal. Formally:
定义 1(领域不变语言模型)。我们说一个 LLM 具有领域不变特性,如果对于所有具有等效语义的上下文样本对,表示为 cpcq,其中 cpCp 来自边际分布 pC,而语义上等效的 cqCq 来自边际分布 qC,它们在最后一个变换层 φtrans L 的输出的 argmax(贪婪采样解码器)是相等的。形式上:
(zpL,zqL)ZpL×ZqL with zpL=φ1:L(cp),zqL=φ1:L(cq), and cpcq:
argmaxi{1,,|V|}[zpL]i=argmaxi{1,,|V|}[zqL]i
(1.7)
where zpL and zqL are the outputs of the last transformation layer derived from cp and cq respectively,with zp1=φenc (0,cp,0,0;ρ) and zq1=φenc (0,0,cq,0;ρ) Here, φenc :PRj×d1 is the embedding and concatenation function, ρ:VRd1 is an embedding function,and0denotes the mask tokens for the components. The latent representations Zpl={zp,ul=φ1:l(cp,u)}u=1hp and Zql={zq,ul=φ1:l(cq,u)}u=1hq come from the respective domains with hp1 and hq1 samples,by applying the composition function up to layer index l ,where φ1:l=φtrans lφtrans l1φtrans 2φtrans 1φenc  .
其中 zpLzqL 是分别来自 cpcq 的最后一个变换层的输出,具有 zp1=φenc (0,cp,0,0;ρ)zq1=φenc (0,0,cq,0;ρ)。这里, φenc :PRj×d1 是嵌入和连接函数,ρ:VRd1 是嵌入函数,0 表示组件的掩码标记。潜在表示 Zpl={zp,ul=φ1:l(cp,u)}u=1hpZql={zq,ul=φ1:l(cq,u)}u=1hq 来自各自的领域,具有 hp1hq1 样本,通过应用组合函数直到层索引 l,其中 φ1:l=φtrans lφtrans l1φtrans 2φtrans 1φenc 
The semantic equivalence symbol refers to selecting consistently equivalent samples from the respective sets, e.g., the number " 1 " and string "one", which are then pairwise embedded, transformed, and compared. We then stipulate that for all these pairwise latent representations, their argmax must be equal, ensuring they map to the same next token. The determination of semantic equivalence is incumbent upon the modeling process of the distributions and is an integral part of the training process design. This definition illustrates how the phenomenon of cross-domain generalization in LLMs can emerge when a sufficiently large and diverse training data corpus is used. The definition of domain-invariant properties in LLMs and the semantic equivalence can be viewed as a language-based extension of the Curry-Howard correspondence, which originally establishes a direct relationship between computer programs and mathematical proofs (Howard, 1980).
语义等价符号 指的是从各自集合中选择一致等价的样本,例如数字 "1" 和字符串 "one",然后将它们成对嵌入、转换并比较。我们规定,对于所有这些成对的潜在表示,它们的 argmax 必须相等,以确保它们映射到相同的下一个标记。语义等价的确定依赖于分布的建模过程,是训练过程设计的一个 integral 部分。这个定义说明了在使用足够大且多样化的训练数据语料库时,LLMs 中跨领域泛化现象如何出现。LLMs 中领域不变属性的定义和语义等价可以被视为对 Curry-Howard 对应关系的基于语言的扩展,该对应关系最初建立了计算机程序与数学证明之间的直接关系(Howard, 1980)。
Domain-Invariant Associations While Definition (1) provides a strict criterion for domain invariance, in practice, we often have only a degree of similarity between domain representations, especially in an auto-regressive generative process. To quantify this similarity, we introduce a semantic distance measure that captures the closeness of representations of the model's layers,and provide an ϵ -close domain-invariance approximation.
领域不变关联 虽然定义 (1) 为领域不变性提供了严格的标准,但在实践中,我们通常只有领域表示之间的相似度,特别是在自回归生成过程中。为了量化这种相似性,我们引入了一种语义距离度量,捕捉模型层表示的接近程度,并提供一个 ϵ -接近的领域不变性近似。
Let FΘ be the class of LLMs with ϵ -close domain-invariant associations:
FΘ 为具有 ϵ -接近领域不变关联的 LLM 类:
FΘ={fθ:PYθΘ,fθ with ϵ-close domain-invariant associations }
For a model fθFΘ ,we define the semantic distance between domains represented by Cp and Cq as:
对于模型 fθFΘ,我们定义由 CpCq 表示的领域之间的语义距离为:
(1.8)dsem ,θ(Cp,Cq)=min1<lLmaxzplZplzqlZqlzplzql2
where zpl and zql are the input representations for the respective layer index l based on Cp and Cq following Definition (1).
其中 zplzql 是基于 CpCq 的相应层索引 l 的输入表示,遵循定义 (1)。
We say that two domains with contextual samples Cp and Cq have ϵ - close domain-invariant associations if dsem ,θ(Cp,Cq)ϵ ,thereby setting an upper bound on the difference in representations measured by dsem ,θ . The semantic distance provides an approximation of the degree of domain invariance formed through associations, with smaller distances indicating greater similarity between domain representations. It also captures the idea that as the distance between the transformed representations of samples from two domains decreases, the semantic distance between these domains also decreases. The semantic distance dsem ,θ therefore quantifies this relationship and can be applied to a variety of domains, considering factors such as semantic overlap, structural similarity, and task-specific relevance.
我们说具有上下文样本 CpCq 的两个领域具有 ϵ - 接近的领域不变关联,如果 dsem ,θ(Cp,Cq)ϵ ,从而为通过 dsem ,θ 测量的表示差异设定了上限。语义距离提供了通过关联形成的领域不变性的程度的近似,较小的距离表示领域表示之间的更大相似性。它还捕捉到这样一个观点:当来自两个领域的样本的变换表示之间的距离减小时,这两个领域之间的语义距离也会减小。因此,语义距离 dsem ,θ 定量化了这种关系,并可以应用于多种领域,考虑语义重叠、结构相似性和任务特定相关性等因素。
In practice, the argmax is not applied; instead, more sophisticated autoregressive sampling techniques are used. Therefore, it suffices that the last layer representations zpL and zqL are close enough to semantically map to a similar distribution of representations. This is supported by empirical evidence that suggests the existence of semantic meaning spaces or a semantic law of motion,as described by Stephen Wolfram (Wolfram,2023). The ϵ - close domain-invariant association offers a practical relaxation when perfect invariance is unattainable. This approach provides a quantifiable measure of similarity between domain representations. In an auto-regressive setting, it can leverage the semantic closeness between tokens to guide the semantic space towards the desired outcome.
在实践中,未应用 argmax;相反,使用了更复杂的自回归采样技术。因此,最后一层表示 zpLzqL 只需足够接近,以在语义上映射到相似的表示分布。这得到了实证证据的支持,表明存在语义意义空间或语义运动法则,正如斯蒂芬·沃尔夫拉姆(Wolfram, 2023)所描述的那样。ϵ - 接近的领域不变关联在完美不变性无法实现时提供了一个实际的放宽。这种方法提供了领域表示之间相似性的可量化度量。在自回归设置中,它可以利用标记之间的语义接近性来引导语义空间朝向期望的结果。
Furthermore,if we assume that all transformation functions φtrans π of the model fθ with π>l are Lipschitz continuous with Lipschitz constant Kπ , then for all zpL,zqL based on Cp and Cq coming from the respective marginal distributions, the upper bound on the difference in representations in the last layer is given by:
此外,如果我们假设模型 fθ 的所有变换函数 φtrans π 具有 Lipschitz 连续性,并且 Lipschitz 常数为 Kπ ,那么对于所有基于 CpCq 来自各自边际分布的 zpL,zqL ,最后一层表示的差异的上限由以下公式给出:
(1.9)(zpL,zqL)ZpL×ZqL:zpLzqL2ϵπ=l+1LKπ.
See also the Lipschitz constant of self-attention (Kim et al., 2021).
另见自注意力的利普希茨常数(Kim et al., 2021)。
Hypothesis on Domain-Invariant Associations The described setting now provides a framework to formulize the relationship between in-context learning and domain-invariant representation learning in LLMs. Based on this, we propose the following hypothesis:
关于域不变关联的假设 现在描述的设置提供了一个框架,以形式化上下文学习与大型语言模型中的域不变表示学习之间的关系。基于此,我们提出以下假设:
Hypothesis 1. For an LLM fθFΘ with ϵ -close domain-invariant associations the following applies: Given any pair of samples from domains Cp (source) and Cq (target) with their respective latent representations,if their semantic distance dsem ,θ(Cp,Cq)ϵ for some ϵ0 ,then:
假设 1. 对于具有 ϵ -接近域不变关联的 LLM fθFΘ ,以下适用:给定来自域 Cp (源)和 Cq (目标)的任意一对样本及其各自的潜在表示,如果它们的语义距离 dsem ,θ(Cp,Cq)ϵ 对于某个 ϵ0 ,则:
(1.10)Δ(qYX,qfθ(X))Δ(pYX,pfθ(X))+δ(ϵ),
where:
其中:
  • X is a random variable taking values in the input space X ,
  • X 是一个在输入空间 X 中取值的随机变量,
  • Y is a random variable taking values in the output space Y ,
  • Y 是一个在输出空间 Y 中取值的随机变量,
  • Δ is a discrepancy measure between probability distributions,
  • Δ 是概率分布之间的差异度量,
  • qYX and pYX are the conditional distributions of Y given X for the target and source domains respectively,
  • qYXpYX 分别是目标域和源域中给定 XY 的条件分布,
  • qfθ(X) and pfθ(X) are the distributions of the model’s outputs fθ(X) in the target and source domains respectively,
  • qfθ(X)pfθ(X) 分别是目标域和源域中模型输出 fθ(X) 的分布,
  • δ(ϵ) is a function that approaches zero as ϵ approaches zero.
  • δ(ϵ) 是一个随着 ϵ 趋近于零而趋近于零的函数。
This hypothesis formalizes the concept that when the semantic distance between samples of a source domain and a target domain is small, the model's discrepancy between its output distribution and the true output distribution on the target domain is bounded by its discrepancy on the source domain plus a small error term. This error term decreases as the semantic distance decreases.
该假设形式化了这样一个概念:当源域和目标域样本之间的语义距离较小时,模型的输出分布与目标域真实输出分布之间的差异受限于其在源域上的差异加上一个小的误差项。随着语义距离的减小,该误差项也会减小。
In turn, this also highlights the model's limitations: when a query extrapolates beyond the model's domain capabilities, we obtain meaningless and potentially hallucinated results (Huang et al., 2023, 2024), as the model attempts to fill knowledge gaps with its existing, but irrelevant, capabilities. To some degree, this limitation is often obfuscated by blending the generative process with reinforcement learning fine-tuning from human feedback (Ouyang et al., 2022; Kaufmann et al., 2024). However, recent publications have begun to critically examine these aspects (Hicks et al., 2024). Despite these and other open challenges, the proposed intersection of domain-invariant learning and LLMs, along with the utilization of their operator-like patterns, offers potential for grounding prompt engineering in a formal framework and advancing NeSy approaches, particularly in bridging symbolic reasoning with the adaptive capabilities of neural networks.
这也突显了模型的局限性:当查询超出模型的领域能力时,我们会得到无意义且可能是幻觉的结果(Huang et al., 2023, 2024),因为模型试图用其现有但无关的能力来填补知识空白。在某种程度上,这种局限性常常被通过将生成过程与来自人类反馈的强化学习微调相结合而掩盖(Ouyang et al., 2022; Kaufmann et al., 2024)。然而,最近的出版物开始批判性地审视这些方面(Hicks et al., 2024)。尽管存在这些和其他未解决的挑战,所提出的领域不变学习与大型语言模型(LLMs)的交集,以及利用其类操作符模式的潜力,为在正式框架中扎根提示工程并推进神经符号(NeSy)方法提供了可能性,特别是在将符号推理与神经网络的自适应能力相结合方面。

1.4 A Neuro-Symbolic Perspective on Large Lan- guage Models
1.4 大型语言模型的神经符号视角

Building on the in-context learning capabilities of large language models (LLMs) and their ability to interpret in-context associations, we can define task-specific prompts that interpret various instructions and contextual patterns, and combine them in a neuro-symbolic (NeSy) approach. These prompts leverage the operator-like patterns learned by the model, such as equals, is like, and from ... to, to solve task-specific requirements without the necessity to update model parameters. This NeSy approach connects various domains via a language-based interface, utilizing the model's capacity to recognize and apply semantic patterns across domains.
基于大型语言模型(LLMs)的上下文学习能力及其解释上下文关联的能力,我们可以定义特定任务的提示,这些提示能够解释各种指令和上下文模式,并将其结合在一个神经符号(NeSy)方法中。这些提示利用模型学习到的类操作符模式,如等于、类似于和从...到,以解决特定任务的需求,而无需更新模型参数。这种NeSy方法通过基于语言的接口连接各个领域,利用模型识别和应用跨领域语义模式的能力。
We therefore consider LLMs suitable as semantic parsers (Dinu et al., 2024a) that can take in various expressions (which are operation descriptions defining how to manipulate strings), interpret them using learned associations, and predict the most probable statistical outcome. These expressions can be composed into complex functions, forming a multi-step generative process that leverages a model's domain-invariant capabilities.
因此,我们认为大型语言模型(LLMs)适合作为语义解析器(Dinu et al., 2024a),能够接受各种表达(这些是定义如何操作字符串的操作描述),利用学习到的关联进行解释,并预测最可能的统计结果。这些表达可以组合成复杂的函数,形成一个多步骤的生成过程,利用模型的领域不变能力。
The language-based interface is well-suited for a NeSy approach to interface with classical symbolic methods. We can interchange between solvers and various tools that can verify our predicted outcomes, taking advantage of the model's ability to bridge symbolic and sub-symbolic paradigms. This NeSy approach allows for more structured and rule-based approaches, where the model not only predicts statistical associations but also adheres to defined symbolic rules and expressions. This enhances our ability to verify results by combining the strengths of both neural and symbolic components.
基于语言的接口非常适合采用神经符号(NeSy)方法与经典符号方法进行交互。我们可以在求解器和各种工具之间进行互换,这些工具能够验证我们预测的结果,利用模型在符号和子符号范式之间架起桥梁的能力。这种神经符号方法允许采用更结构化和基于规则的方法,其中模型不仅预测统计关联,还遵循定义的符号规则和表达。这增强了我们通过结合神经和符号组件的优势来验证结果的能力。
SymbolicAI Framework In our work Dinu et al. (2024a), we propose a neuro-symbolic approach called SymbolicAI. The framework is designed to bridge the gap between symbolic reasoning and neural network capabilities, particularly with the help of LLMs.
符号人工智能框架 在我们的工作中,Dinu et al. (2024a) 提出了一个称为符号人工智能(SymbolicAI)的神经符号方法。该框架旨在弥合符号推理与神经网络能力之间的差距,特别是在大型语言模型(LLMs)的帮助下。
At its core, SymbolicAI performs mappings that take in various symbols and produce symbols as output, where symbols are sequences of tokens drawn from a finite vocabulary V . These symbols can be of any length and structure, allowing for great flexibility in representation. The SymbolicAI framework uses these symbols to form more complex expressions, and uses in-context learning to adapt the generative process based on the given instructions, contexts and query statements. A key concept in this framework is the idea of expressions and their polymorphic contexts. An expression is an operation that defines transformations of symbols to create new symbols. This can be as simple as concatenating two symbols or as complex as applying mathematical operations to them. When this operation is evaluated, it produces a new symbol based on specific rules. For example, one rule might define how to perform arithmetic operations on symbols representing numbers and strings. To illustrate this, consider an example where we want to combine the symbols "eight" and "8". We could define a rule that interprets this as addition, resulting in a new symbol "16" as the prediction.
从本质上讲,SymbolicAI 执行映射,将各种符号作为输入并生成符号作为输出,其中符号是从有限词汇表中提取的标记序列 V。这些符号可以具有任意长度和结构,从而在表示上提供了极大的灵活性。SymbolicAI 框架使用这些符号形成更复杂的表达式,并利用上下文学习根据给定的指令、上下文和查询语句来调整生成过程。该框架中的一个关键概念是表达式及其多态上下文的概念。表达式是定义符号转换以创建新符号的操作。这可以简单到连接两个符号,也可以复杂到对它们应用数学运算。当这个操作被评估时,它会根据特定规则生成一个新符号。例如,一个规则可能定义如何对表示数字和字符串的符号执行算术运算。为了说明这一点,考虑一个示例,我们想要组合符号“eight”和“8”。我们可以定义一个规则,将其解释为加法,结果生成一个新符号“16”作为预测。
The framework is grounded in formal language theory and grammars, building on the work of linguists and computer scientists Chomsky and Newell (Chomsky, 1959; Newell and Simon, 1972; Hays, 2003). Our NeSy approach uses symbolic methods to define sets of rules for relationships between objects, evoke causal structures, and perform formal evaluations. This allows us to define task-specific operations that query LLMs to guide their predictions and verify the results with other tools and solvers. An LLM can interpret instructions to form associative patterns with the respective data and return a statistically probable answer. Our framework then combines such operations into more complex expressions. These expressions guide the generative process of the LLM and enable efficient domain-specific adaptations even under continual domain changes without the necessity of parameter updates. SymbolicAI thus provides a systematic way to represent and manipulate symbolic information and control generative processes. Our NeSy approach is well suited for enabling enhanced reasoning capabilities and building broad AI systems that verify the results of generated content such as that obtained through LLMs, while still leveraging their powerful language understanding and generalization abilities. This can lead to more robust and versatile AI systems capable of handling a wider range of tasks, setting the foundation for automated AI workflows. Moreover, the symbolic component of NeSy systems can address challenges of computational irreducibility often encountered in complex tasks, allowing for more transparent, interpretable, and discernible results in scenarios where purely neural approaches might lead to unpredictable or difficult-to-verify outcomes.
该框架基于形式语言理论和语法,建立在语言学家和计算机科学家乔姆斯基和纽厄尔的研究基础上(乔姆斯基,1959;纽厄尔和西蒙,1972;海斯,2003)。我们的 NeSy 方法使用符号方法定义对象之间关系的规则集,唤起因果结构,并进行形式评估。这使我们能够定义特定任务的操作,以查询 LLM 来指导其预测,并用其他工具和求解器验证结果。LLM 可以解释指令,以形成与相应数据的关联模式并返回统计上可能的答案。然后,我们的框架将这些操作组合成更复杂的表达式。这些表达式指导 LLM 的生成过程,并使其能够在不断变化的领域中进行高效的领域特定适应,而无需更新参数。因此,SymbolicAI 提供了一种系统化的方式来表示和操控符号信息,并控制生成过程。我们的 NeSy 方法非常适合增强推理能力,并构建广泛的 AI 系统,以验证通过 LLM 生成的内容的结果,同时仍然利用它们强大的语言理解和概括能力。这可以导致更强大和多功能的 AI 系统,能够处理更广泛的任务,为自动化 AI 工作流奠定基础。此外,NeSy 系统的符号组件可以解决复杂任务中经常遇到的计算不可约性挑战,从而在纯神经方法可能导致不可预测或难以验证的结果的场景中,提供更透明、可解释和可辨别的结果。
Computational Graphs, Explainability and Evaluation At the core of the proposed framework is its ability to construct complex computational graphs with symbolic and sub-symbolic approaches that use various expressions. These computational graphs guide the generative process through the in-context learning capabilities of LLMs via functional methods that build, evaluate and execute prompt instructions and contextual patterns. These operations form more complex behaviors, adapting the application behavior based on the data and context.
计算图、可解释性和评估 提议框架的核心在于其构建复杂计算图的能力,这些计算图结合了符号和亚符号的方法,使用各种表达式。这些计算图通过 LLM 的上下文学习能力,通过功能方法构建、评估和执行提示指令和上下文模式,从而指导生成过程。这些操作形成更复杂的行为,根据数据和上下文调整应用行为。
As a key contribution, we propose a quality measure based on the Fréchet distance and its empirical VERTEX score that evaluates the performance of computational graphs with respect to reference distributions. Additionally, we propose a benchmark to evaluate various LLMs suitable for this NeSy approach.
作为一个关键贡献,我们提出了一种基于 Fréchet 距离及其经验 VERTEX 评分的质量度量,该度量评估计算图相对于参考分布的性能。此外,我们提出了一个基准来评估适合此 NeSy 方法的各种 LLM。

1.5 List of Publications
1.5 发表的论文列表

In this section, the publications that are directly and more loosely connected to this work are listed, with short explanations of the publications and my roles in them. A "*" character indicates shared first authorship.
在本节中,列出了与本工作直接和间接相关的出版物,并简要说明了这些出版物及我在其中的角色。一个 "*" 字符表示共享第一作者身份。
Directly connected publications. The following publications are directly connected to this work. All publications are full conference papers and have been peer-reviewed.
直接相关的出版物。以下出版物与本工作直接相关。所有出版物均为完整的会议论文,并经过同行评审。
  1. (Dinu et al., 2024a): M.-C. Dinu, C. Leoveanu-Condrei, M. Hol-zleitner, W. Zellinger, S. Hochreiter. SymbolicAI: A framework for logic-based approaches combining generative models and solvers. In Third Conference on Lifelong Learning Agents, PLMR, 2024.
  1. (Dinu et al., 2024a): M.-C. Dinu, C. Leoveanu-Condrei, M. Holzleitner, W. Zellinger, S. Hochreiter. SymbolicAI: 一个结合生成模型和求解器的基于逻辑的方法框架。在第三届终身学习代理会议,PLMR,2024。
  1. (Dinu et al., 2024b) M.-C. Dinu, C. Leoveanu-Condrei, M. Holzleitner, W. Zellinger, S. Hochreiter. SymbolicAI: A framework for logic-based approaches combining generative models and solvers. GenAI4DM Workshop at The Twelfth International Conference on Learning Representations, 2024.
  1. (Dinu et al., 2024b) M.-C. Dinu, C. Leoveanu-Condrei, M. Holzleitner, W. Zellinger, S. Hochreiter. SymbolicAI: 一个结合生成模型和求解器的基于逻辑的方法框架。GenAI4DM 研讨会,第十二届国际学习表征会议,2024。
  1. (Dinu et al., 2023): M.-C. Dinu, M. Holzleitner, M. Beck, H. D. Nguyen, A. Huber, H. Eghbal-zadeh, B. A. Moser, S. Pereverzyev, S. Hochreiter, W. Zellinger. Addressing parameter choice issues in unsupervised domain adaptation by aggregation. In International Conference on Learning Representations, 2023.
  1. (Dinu et al., 2023): M.-C. Dinu, M. Holzleitner, M. Beck, H. D. Nguyen, A. Huber, H. Eghbal-zadeh, B. A. Moser, S. Pereverzyev, S. Hochreiter, W. Zellinger. 通过聚合解决无监督领域适应中的参数选择问题。在2023年国际学习表征会议上发表。
  1. (Zellinger et al., 2021): W. Zellinger, N. Shepeleva, M.-C. Dinu, H. Eghbal-zadeh, H. D. Nguyen, B. Nessler, S. Pereverzyev, B. A. Moser. The balancing principle for parameter choice in distance-regularized domain adaptation. In Advances in Neural Information Processing Systems, 2021.
  1. (Zellinger et al., 2021): W. Zellinger, N. Shepeleva, M.-C. Dinu, H. Eghbal-zadeh, H. D. Nguyen, B. Nessler, S. Pereverzyev, B. A. Moser. 距离正则化领域适应中参数选择的平衡原则。在2021年神经信息处理系统进展会议上发表。
(Dinu et al., 2023) is the main publication of our work on parameter choice issues for unsupervised domain adaptation. My main contributions to this work are the implementation of the method, writing of the paper, and development of the benchmark and baseline models. This includes the design and implementation of the IWA model and its training for the domain adaptation experiments, the realization of the baselines, as well as the investigation of approaches for evaluating the aggregation correlation coefficients of the IWA models. (Dinu et al., 2024a) is my most comprehensive publication and focuses on NeSy approaches. My contributions are the conception and implementation of the main approach, as well as the writing of the paper, development of the framework, the proposal and implementation of the VERTEX score, the creation of the benchmark and main parts of the formal derivations.
(Dinu et al., 2023) 是我们关于无监督领域适应中参数选择问题的主要出版物。我对这项工作的主要贡献包括方法的实现、论文的撰写以及基准和基线模型的开发。这包括IWA模型的设计和实现及其在领域适应实验中的训练,基线的实现,以及对评估IWA模型聚合相关系数的方法的研究。(Dinu et al., 2024a) 是我最全面的出版物,重点关注NeSy方法。我的贡献包括主要方法的构思和实现,以及论文的撰写、框架的开发、VERTEX评分的提议和实现,以及基准的创建和正式推导的主要部分。
Loosely connected publications. The following publications are loosely connected to this work. All publications in this section have been peer-reviewed, except for Patel et al. (2024) which is still under review at NeurIPS
松散关联的出版物。以下出版物与本工作松散相关。本节中的所有出版物均经过同行评审,除了 Patel 等人(2024),该文仍在 NeurIPS 审核中。
2024.
  1. (Patel et al., 2024): A. Patel, M. Hofmarcher, C. Leoveanu-Condrei, M.- C. Dinu, C. Callison-Burch, S. Hochreiter. Large Language Models Can Self-Improve At Web Agent Tasks. Advances in Neural Information Processing Systems (under review), 2024.
  1. (Patel 等人, 2024): A. Patel, M. Hofmarcher, C. Leoveanu-Condrei, M.-C. Dinu, C. Callison-Burch, S. Hochreiter. 大型语言模型可以在网络代理任务中自我改进。神经信息处理系统进展(审稿中),2024。
  1. (Dinu* et al., 2022): M.-C. Dinu*, M. Hofmarcher*, V. P. Patil, M. Dorfer, P. M. Blies, J. Brandstetter, J. A. Arjona-Medina, S. Hochreiter. XAI and Strategy Extraction via Reward Redistribution. International Workshop on Extending Explainable AI Beyond Deep Models and Classifiers. Springer International Publishing, page 177-205, 2022.
  1. (Dinu* 等人, 2022): M.-C. Dinu*, M. Hofmarcher*, V. P. Patil, M. Dorfer, P. M. Blies, J. Brandstetter, J. A. Arjona-Medina, S. Hochreiter. 通过奖励再分配进行可解释人工智能和策略提取。国际可解释人工智能超越深度模型和分类器研讨会。施普林格国际出版,页码 177-205,2022。
  1. (Schweighofer* et al., 2022): K. Schweighofer*, A. Radler*, M.-C. Dinu*, M. Hofmarcher, V. Prakash Patil, A. Bitto-Nemling, H. Eghbal-zadeh, S. Hochreiter. A dataset perspective on offline reinforcement learning. First Conference on Lifelong Learning Agents, PMLR, page 470-517, 2022.
  1. (Schweighofer* 等人, 2022): K. Schweighofer*, A. Radler*, M.-C. Dinu*, M. Hofmarcher, V. Prakash Patil, A. Bitto-Nemling, H. Eghbal-zadeh, S. Hochreiter. 从数据集的角度看离线强化学习。终身学习代理首次会议,PMLR,页码 470-517,2022。
  1. (Patil* et al., 2022): V. P. Patil*, M. Hofmarcher*, M.-C. Dinu, M. Dorfer, P. M. Blies, J. Brandstetter, J. A. Arjona-Medina, S. Hochreiter. Align-rudder: Learning from few demonstrations by reward redistribution. Proceedings of the 39th International Conference on Machine Learning, volume 39, PMLR, page 162:17531-17572, 2022.
  1. (Steinparz et al., 2022): C. A. Steinparz, T. Schmied, F. Paischer, M.-C. Dinu, V. P. Patil, A. Bitto-Nemling, H. Eghbal-zadeh, S. Hochreiter. Reactive exploration to cope with non-stationarity in lifelong reinforcement learning. First Conference on Lifelong Learning Agents, PMLR, page 441-469, 2022.
  1. (Siripurapu et al., 2022): R. Siripurapu, V. P. Patil, K. Schweighofer, M.-C. Dinu, T. Schmied, L. E. F. Diez, M. Holzleitner, H. Eghbal-Zadeh, M. K. Kopp, S. Hochreiter. InfODist: Online distillation with Informative rewards improves generalization in Curriculum Learning. Deep Reinforcement Learning Workshop NeurIPS, 2022.
  1. (Schweighofer et al., 2021): K. Schweighofer, M. Hofmarcher, M.-C. Dinu, P. Renz, A. Bitto-Nemling, V. Patil, S. Hochreiter. Understanding the effects of dataset characteristics on offline reinforcement learning. Deep RL Workshop NeurIPS, 2021.
  1. (Holzleitner et al., 2019): M. Holzleitner, J. A. Arjona-Medina, M.- C. Dinu, A. Vall, L. Gruber, S. Hochreiter. A Two Time-Scale Update Rule Ensuring Convergence of Episodic Reinforcement Learning Algorithms at the Example of RUDDER. NeurIPS Optimization Foundations for Reinforcement Learning Workshop, 2019.

Chapter 2 Selected Publications
第二章 选定出版物

In this chapter, our main publications of this cumulative thesis are presented: (Zellinger et al., 2021) in Section 2.1, (Dinu et al., 2023) in Section 2.2, and (Dinu et al., 2024b) in Section 2.3. The full list of publications and my contributions are discussed in Section 1.5.
在本章中,我们介绍了本累积论文的主要出版物:第2.1节中的(Zellinger et al., 2021),第2.2节中的(Dinu et al., 2023),以及第2.3节中的(Dinu et al., 2024b)。有关出版物的完整列表及我的贡献将在第1.5节中讨论。
The papers shown in this chapter are shrunk to fit the page layout and otherwise reprinted as originally published with permission from the corresponding publishers. Copyright for the original publications is held by the respective copyright holders. For the full supplemental materials of (Zellinger et al., 2021),(Dinu et al., 2023) and (Dinu et al., 2024a) please refer to the respective papers at proceedings.neurips.cc, and proceedings.mlr.press.
本章中展示的论文经过缩减以适应页面布局,并在获得相应出版商的许可后以原始出版形式重新印刷。原始出版物的版权归各自的版权持有者所有。有关(Zellinger et al., 2021),(Dinu et al., 2023)和(Dinu et al., 2024a)的完整补充材料,请参阅相应的论文,网址为 proceedings.neurips.cc 和 proceedings.mlr.press。
2.1 The balancing principle for parameter choice in distance-regularized domain adaptation
2.1 距离正则化领域适应中的参数选择平衡原则

The balancing principle for parameter choice in distance-regularized domain adaptation
距离正则化领域适应中的参数选择平衡原则

Werner Zellinger 1, Natalia Shepeleva 1 Marius-Constantin Dinu 2,3
Hamid Eghbal-zadeh 4,5 Duc Hoan Nguyen 6 Bernhard Nessler 2
Sergei V. Pereverzyev 6 Bernhard A. Moser 1
1 Software Competence Center Hagenberg GmbH
1 软件能力中心哈根贝格有限公司
2 Institute for Machine Learning,Johannes Kepler University Linz
2 机器学习研究所,约翰内斯·开普勒大学林茨
3 Dynatrace Research
3 Dynatrace研究
e of Computational Perception, Johannes Kepler University Linz
约翰内斯·开普勒大学林茨计算感知系
5 LIT AI Lab,Johannes Kepler University Linz
5 林茨大学 LIT AI 实验室
6 Johann Radon Institute for Computational and Applied Mathematics,Austrian
6 约翰·拉东计算与应用数学研究所,奥地利
Academy of Sciences
科学院
*werner.zellinger@scch.at

Abstract
摘要

We address the unsolved algorithm design problem of choosing a justified regularization parameter in unsupervised domain adaptation. This problem is intriguing as no labels are available in the target domain. Our approach starts with the observation that the widely-used method of minimizing the source error, penalized by a distance measure between source and target feature representations, shares characteristics with regularized ill-posed inverse problems. Regularization parameters in inverse problems are optimally chosen by the fundamental principle of balancing approximation and sampling errors. We use this principle to balance learning errors justified rule for the choice of the regularization parameter. In contrast to the state of the art, our approach allows source and target distributions with disjoint supports. An empirical comparative study on benchmark datasets underpins the performance of our approach.
我们解决了在无监督领域适应中选择合理正则化参数的未解决算法设计问题。这个问题引人注目,因为目标领域没有可用的标签。我们的方法始于观察到,广泛使用的通过源特征表示和目标特征表示之间的距离度量来惩罚源误差最小化的方法,与正则化病态逆问题具有相似特征。逆问题中的正则化参数是通过平衡近似误差和采样误差的基本原则来最佳选择的。我们利用这一原则来平衡学习误差,从而为正则化参数的选择提供合理依据。与现有技术相比,我们的方法允许源分布和目标分布具有不相交的支持。对基准数据集的实证比较研究支持了我们方法的性能。

1 Introduction
1 引言

Domain adaptation uses the knowledge in a source domain to improve the performance of an algorithm on a related target domain [1]. In particular, domain adaptation tackles domain shifts in machine learning applications: Medical diagnostic systems should be adapted to new physical human variations; Industrial quality inspection systems should be accurate for new products; Self-driving cars should be able to adapt to new geographical environments and weather conditions. In this work, we focus on unsupervised domain adaptation where labels are only available in the source domain.
领域适应利用源领域中的知识来提高算法在相关目标领域的性能 [1]。特别是,领域适应解决机器学习应用中的领域转移问题:医疗诊断系统应适应新的身体人类变异;工业质量检查系统应对新产品保持准确;自动驾驶汽车应能够适应新的地理环境和气候条件。在这项工作中,我们专注于无监督领域适应,其中标签仅在源领域中可用。
There are mainly two types of approaches for unsupervised domain adaptation: importance weighting [2,3,4,5,6,7] and feature representation learning [8,9,10,11,12,13] . In this work,we focus on feature representation learning which goes beyond classical importance weighting by allowing a target distribution with support outside of the source distribution. The core idea behind feature representation learning approaches is to map the data into a new feature space where the source and target
无监督领域适应主要有两种方法:重要性加权 [2,3,4,5,6,7] 和特征表示学习 [8,9,10,11,12,13]。在这项工作中,我们专注于特征表示学习,它超越了经典的重要性加权,允许目标分布的支持超出源分布。特征表示学习方法的核心思想是将数据映射到一个新的特征空间,在这个空间中,源和目标
35th Conference on Neural Information Processing Systems (NeurIPS 2021). data representations appear similar, and where enough information is preserved for prediction [14]. The similarity is often realized by regularization using distance measures between source and target representations [15,16,17,18,19,20,21] . However,the performance of such methods crucially depends on the choice of the regularization parameter which penalizes the distance. The problem we investigate in this work is to choose this parameter, which is sophisticated without any target labels.
第35届神经信息处理系统会议(NeurIPS 2021)。数据表示看起来相似,并且保留了足够的信息用于预测 [14]。这种相似性通常通过使用源表示和目标表示之间的距离度量进行正则化来实现 [15,16,17,18,19,20,21]。然而,这种方法的性能在很大程度上依赖于正则化参数的选择,该参数对距离进行惩罚。我们在本研究中探讨的问题是如何选择这个参数,而没有任何目标标签的情况下,这一过程是复杂的。
Figure 1: Unsupervised domain adaptation on Transformed Moons. Left: Target data (black dots) partially outside of the support of the source data (blue +,orange × ). The common assumption of bounded density ratio is violated in large regions. In contrast, all our assumptions are satisfied. Our method identifies the best parameter of a domain adaptation algorithm [16] (green solid) which improves training on source data only (red dashed). Right: Regularization parameter (x-axis) which penalizes a distance [25] (purple) leading to models with different source error (blue) and target error (black). Importance weighted validation (IWV) shows the smallest error (red) for models without domain adaptation (αIWV=0) . In contrast,our approach identifies the optimal parameter (αBP=1) .
图1:在变换月亮数据集上的无监督领域适应。左侧:目标数据(黑点)部分位于源数据(蓝色+,橙色 ×)的支持范围之外。大区域内违反了有界密度比的共同假设。相反,我们的所有假设都得到了满足。我们的方法识别出领域适应算法的最佳参数 [16](绿色实线),该参数仅改善源数据上的训练(红色虚线)。右侧:正则化参数(x轴),对距离进行惩罚 [25](紫色),导致不同源误差(蓝色)和目标误差(黑色)的模型。重要性加权验证(IWV)显示无领域适应模型的最小误差(红色) (αIWV=0)。相反,我们的方法识别出最佳参数 (αBP=1)
While remarkable theoretical results have been achieved which quantify the generalization ability of domain adaptation models [8,22,19,20,21] ,the choice of the regularization parameter which is crucial for finding such models has not systematically been addressed. Even though some parameter choice strategies exist, they are either purely heuristically driven or very limited by their assumptions [23]. Typical approaches are fixing the regularization parameters [12], minimizing the source error [16], balancing the source error and a distance [17], multiplying a fixed weighting parameter (e.g. 1 in [16]) by a heuristic schedule value that increases during training, or, (importance) weighting the input samples by the ratio between target and source density [3, 24, 23]. One common problem shared among all these approaches is that they all can fail if the density ratio is unbounded. Such unbounded density ratio is typical for many of the high dimensional problems considered in machine learning [19], e.g. see Figure 1. Besides the aforementioned issues, the lack of principled strategies for parameter choice causes misinterpretations in the ranking of domain adaptation methods which are traditionally compared by performance, while often relying on different parameter choice strategies.
尽管已经取得了显著的理论结果,量化了领域适应模型的泛化能力 [8,22,19,20,21],但对于寻找此类模型至关重要的正则化参数的选择尚未系统地进行探讨。尽管存在一些参数选择策略,但它们要么完全基于启发式,要么受到其假设的限制 [23]。典型的方法包括固定正则化参数 [12]、最小化源误差 [16]、平衡源误差和距离 [17]、将固定权重参数(例如 [16] 中的 1)乘以在训练过程中增加的启发式调度值,或者通过目标和源密度之间的比率对输入样本进行(重要性)加权 [3, 24, 23]。所有这些方法共同面临的一个常见问题是,如果密度比是无界的,它们都可能失败。这种无界密度比在机器学习中考虑的许多高维问题中是典型的 [19],例如见图 1。除了上述问题外,缺乏原则性参数选择策略导致了领域适应方法排名的误解,这些方法通常通过性能进行比较,而往往依赖于不同的参数选择策略。
In this work, we propose a principled method for choosing distance-penalizing parameters of feature representation learning approaches for unsupervised domain adaptation. Our approach starts with the observation that the distance-regularization setting of domain adaptation shares characteristics with regularized ill-posed inverse problems (see Table 1). In inverse problems, the regularization parameter can be optimally chosen by the fundamental balancing principle which optimizes an approximation-sample (bias-variance) trade-off [26,27,28] . We apply this principle for balancing domain distance and learning errors of target error bounds. In particular, we approach the problem of non-computable terms in the target error bound by a new algorithmic criterion for approximating the value of balance. We call our method the Balancing Principle for Domain Adaptation (BPDA).
在本研究中,我们提出了一种原则性的方法,用于选择无监督领域适应中特征表示学习方法的距离惩罚参数。我们的方法始于观察领域适应的距离正则化设置与正则化病态逆问题的特征相似(见表 1)。在逆问题中,正则化参数可以通过基本的平衡原则进行最佳选择,该原则优化了近似样本(偏差-方差)权衡 [26,27,28]。我们应用这一原则来平衡领域距离和目标误差界限的学习误差。特别地,我们通过一种新的算法标准来处理目标误差界限中不可计算项的问题,以近似平衡值。我们将我们的方法称为领域适应的平衡原则(BPDA)。
The BPDA is general in the sense that it can be applied based on different target error bounds, e.g. on [8,22,19,20,21] . To the best of our knowledge,the BPDA is the first principled method for parameter choice in unsupervised domain adaptation that allows an unbounded ratio between target and source density. We provide a bound on the generalization error of the best model corresponding to the parameter chosen by the BPDA. Finally, we empirically investigate the behavior of the BPDA based on two target error bounds, different domain adaptation methods and benchmark datasets. Our results show that the BPDA outperforms or is on par with the state of the art on the problem of choosing the regularization parameter, on several domain adaptation methods; applied on different
BPDA 在某种意义上是通用的,因为它可以基于不同的目标误差界限进行应用,例如在 [8,22,19,20,21] 上。根据我们所知,BPDA 是第一个在无监督领域适应中用于参数选择的原则性方法,它允许目标密度与源密度之间存在无限制的比率。我们提供了与 BPDA 选择的参数对应的最佳模型的泛化误差界限。最后,我们基于两个目标误差界限、不同的领域适应方法和基准数据集,对 BPDA 的行为进行了实证研究。我们的结果表明,BPDA 在选择正则化参数的问题上,在几种领域适应方法中,表现优于或与最先进的技术相当;并应用于不同的

2 Summary of results
2 结果总结

Notation Let XRn be an input space and Y be a discrete label space. Following the classical setting of unsupervised domain adaptation [25],we consider two datasets: A source dataset (x,y)= ((x1,lS(x1)),,(xs,lS(xs)))(X×Y)s with inputs x1,,xs independently drawn according to some source distribution (Borel probability measure) pS on X and labeled according to some labeling function 1lS:XY ,and,an unlabeled target dataset x=(x1,,xt)Xt with elements independently drawn according to some target distribution pT on X . Throughout this work, we focus on loss functions L:Y×Y[0,) which satisfy L(y,y)=0 . For example consider the 0-1 loss L(y1,y2):=1[y1y2] ,where 1[P] is 1 iff the predicate P is true and 0 otherwise,and the quadratic loss function L(y1,y2):=|y1y2|2 . We denote the source error by εS(f)=εS(f,lS) with cross-error defined as εS(f,g):=ExpS[L(f(x),g(x))] and its empirical sample estimate by ε^S(f)=ε^S(f,lS) with ε^S(f,g):=i=1sL(f(xi),g(xi)) . We denote the analogously defined target error by εT(f) ,target cross-error by εT(f,g) and its empirical sample estimate by ε^T(f) with empirical cross-error ε^T(f,g) . Throughout this work,we focus on target cross-errors εT(f,g) which satisfy the triangle inequality.
记号 设 XRn 为输入空间, Y 为离散标签空间。遵循无监督领域适应的经典设置 [25],我们考虑两个数据集:一个源数据集 (x,y)= ((x1,lS(x1)),,(xs,lS(xs)))(X×Y)s,其输入 x1,,xs 独立地根据某些源分布(博雷尔概率测度) pSX 上抽取,并根据某些标记函数 1lS:XY 进行标记;另一个是未标记的目标数据集 x=(x1,,xt)Xt,其元素独立地根据某些目标分布 pTX 上抽取。在本研究中,我们专注于满足 L(y,y)=0 的损失函数 L:Y×Y[0,)。例如,考虑 0-1 损失 L(y1,y2):=1[y1y2],其中 1[P] 为 1 当且仅当谓词 P 为真时,其他情况下为 0,以及二次损失函数 L(y1,y2):=|y1y2|2。我们用 εS(f)=εS(f,lS) 表示源误差,交叉误差定义为 εS(f,g):=ExpS[L(f(x),g(x))],其经验样本估计为 ε^S(f)=ε^S(f,lS),并且 ε^S(f,g):=i=1sL(f(xi),g(xi))。我们用 εT(f) 表示类似定义的目标误差,目标交叉误差用 εT(f,g) 表示,其经验样本估计为 ε^T(f),经验交叉误差为 ε^T(f,g)。在本研究中,我们专注于满足三角不等式的目标交叉误差 εT(f,g)
Learning setup In this work, we focus on feature representation learning algorithms for domain adaptation. These approaches aim at finding two learning models: A representation mapping ϕ Φ{ϕ:XR} into some representation space RRm and a classifier gG{g:RY} . Loosely speaking,the aim is to find a mapping ϕ under which the source representations ϕ(x):= (ϕ(x1),,ϕ(xs)) and the target representations ϕ(x):=(ϕ(x1),,ϕ(xt)) appear similar,and, at the same time,enough information is preserved for prediction [14] by g(x) . A common approach to realize this aim is to solve the following objective function [19]
学习设置 在本研究中,我们专注于用于领域适应的特征表示学习算法。这些方法旨在找到两个学习模型:一个表示映射 ϕ Φ{ϕ:XR} 到某个表示空间 RRm 和一个分类器 gG{g:RY}。宽泛地说,目标是找到一个映射 ϕ,在该映射下,源表示 ϕ(x):= (ϕ(x1),,ϕ(xs)) 和目标表示 ϕ(x):=(ϕ(x1),,ϕ(xt)) 看起来相似,同时保留足够的信息以进行预测 [14] 通过 g(x)。实现这一目标的常见方法是解决以下目标函数 [19]。
(1)mingG,ϕΦε^S(gϕ)+αd(ϕ(x),ϕ(x))
where d is a distance measure between source and target representations and α[0,) is a parameter 2 . Good choices for d in Eq. (1) have been identified to be the Wasserstein distance [29,30], the Maximum Mean Discrepancy [31, 32], moment distances [17, 33, 18, 34, 35, 36], adversarially learned distances [16, 37] and other measures of divergence [38, 39, 19, 20].
其中 d 是源表示和目标表示之间的距离度量,α[0,) 是一个参数 2。在方程 (1) 中,已识别出 d 的良好选择为 Wasserstein 距离 [29,30]、最大均值差异 [31, 32]、矩距离 [17, 33, 18, 34, 35, 36]、对抗学习的距离 [16, 37] 和其他发散度量 [38, 39, 19, 20]。
Problem For some α[0,) ,let gαϕα denote the minimizer of Eq. (1). Given an increasing sequence of parameters α1,,αw[0,) with α1=0 ,the problem studied in this work is to choose the parameter α in the sequence α1,,αw with the lowest target error εT(gαϕα) .
问题 对于某些 α[0,),令 gαϕα 表示方程 (1) 的最小化器。给定一个递增的参数序列 α1,,αw[0,),其中 α1=0,本研究中研究的问题是选择序列 α1,,αw 中具有最低目标误差 εT(gαϕα) 的参数 α
Approach Our approach consists in minimizing a target error bound which satisfies the form
方法 我们的方法包括最小化满足以下形式的目标误差界限。
(2)εT(gαϕα)D(α)+E(α)
where D(α) gives a notion of domain distance (cf. [25]) by quantifying a distance between source and target data representations and E(α) comprises different learning errors. We assume that E(α) is bounded by some constant B>0 . The general form in Eq. (2) is satisfied by many error bounds [8,22,20,21,34] which all can be taken as a basis for our approach (more detailed examples are provided in Section 3 and Section 4). One problem that complicates the minimization of these That is, E(α) cannot be directly estimated. The BPDA overcomes this problem by a new criterion for estimating the value of balance between the normalized terms E(α) and D(α) . The BPDA is detailed in Algorithm 1.
在这里,D(α) 给出了领域距离的概念(参见 [25]),通过量化源数据和目标数据表示之间的距离,而 E(α) 包含不同的学习误差。我们假设 E(α) 被某个常数 B>0 所限制。方程 (2) 中的一般形式满足许多误差界限 [8,22,20,21,34],这些误差界限都可以作为我们方法的基础(更详细的例子在第 3 节和第 4 节中提供)。一个复杂化这些最小化的问题是 E(α) 不能被直接估计。BPDA 通过一个新的标准来克服这个问题,以估计归一化项 E(α)D(α) 之间的平衡值。BPDA 在算法 1 中进行了详细描述。
Properties of Algorithm 1 The BPDA has the following striking properties.
算法 1 的属性 BPDA 具有以下显著特性。
  • The BPDA is a general procedure which can be instantiated by any error bound of the form in Eq. (2). See Section 4 and Section 5 for its application based on two different target error bounds [25,20] .
  • BPDA 是一个通用程序,可以通过方程 (2) 中任何形式的误差界限来实例化。有关基于两种不同目标误差界限 [25,20] 的应用,请参见第 4 节和第 5 节。

1 For simplicity,we use labeling functions instead of the more general concept of conditional distributions.
1 为了简化,我们使用标签函数而不是更一般的条件分布概念。
2 For simplicity we omit further regularization of ϕ and g .
2 为了简化,我们省略了对 ϕg 的进一步正则化。

  • In contrast to state-of-the-art methods, the BPDA does not assume a target labeling function lT that is equal to the source labeling function lS (covariate-shift assumption) and it does not assumption). See the supplementary material for a discussion of covariate-shift violations.
  • 与最先进的方法相比,BPDA 不假设目标标签函数 lT 等于源标签函数 lS(协变量转移假设),并且它不做假设)。有关协变量转移违反的讨论,请参见补充材料。
  • The learning model gαBPϕαBP identified by the BPDA satisfies a generalization bound, see Section 4. If the learning errors term E(α) is non-decreasing,then the target error of gαBPϕαBP is only a constant factor away from the minimum minα[0,)D(α)+E(α) of the instantiation bound in Eq. (2).
  • BPDA 识别的学习模型 gαBPϕαBP 满足泛化界限,参见第 4 节。如果学习误差项 E(α) 是非递减的,则 gαBPϕαBP 的目标误差仅与方程 (2) 中实例化界限的最小值 minα[0,)D(α)+E(α) 相差一个常数因子。
Algorithm 1: Balancing principle for domain adaptation (BPDA)
算法 1:领域适应的平衡原则 (BPDA)
nput : Increasing sequence of parameters α1,,αw[0,) with α1=0 and
输入:参数的递增序列 α1,,αw[0,),与 α1=0 以及
minimizers f1:=gα1ϕα1,,fw:=gαwϕαw of Eq. (1).
Eq. (1) 的最小化器 f1:=gα1ϕα1,,fw:=gαwϕαw
Output :Parameter αBP solving the problem above.
输出:解决上述问题的参数 αBP
Initialization : S={}
初始化:S={}
for i=1,,w do
对于 i=1,,w,执行
Compute empirical cross-error ε^T(fi,fj) and domain distance D(αj) for all
计算所有的经验交叉误差 ε^T(fi,fj) 和领域距离 D(αj)
j=1,,i1 .
if ε^T(fi,fj)D(αj)(2+2BD(0)) for all j=1,,i1 then
如果对于所有 j=1,,i1,则
S:=S{αi}
end
结束
return :αBP:=maxS
返回 :αBP:=maxS
Besides the properties above, the BPDA outperforms or is on par with the state of the art, on the problem of choosing the regularization parameter, on several domain adaptation methods; applied on different datasets, see Section 5.
除了上述属性,BPDA 在选择正则化参数的问题上优于或与最先进的技术相当,适用于几种领域适应方法;在不同数据集上应用,见第 5 节。
Related work Approaches which follow Eq. (1) are sometimes interpreted as learning domain-invariant representation. Note that the minimization of d(ϕ(x),ϕ(x)) in Eq. (1) to achieve ϕ(x)= ϕ(x) differs from the conception as regularization [40,41] . In fact,minimization of the distance means unjustified over-penalization which might lead to deteriorated performance [19]. Interestingly, our interpretation as regularization problem opens up a powerful toolbox of mathematical techniques. Our approach takes up the technique of balancing stability and approximation in regularized ill-posed inverse problems. The balancing principle has its origins in [26] devoted to nonparametric regression estimation and has been introduced in the context of ill-posed problems [27] and in supervised learning with kernels [42]. Following this line of research, we propose to apply the mathematical techniques underlying the balancing principle in the context of domain adaptation. The most related principled parameter choice methods in the context of unsupervised domain adaptation are importance-weighted cross-validation [3] and its extensions [24,23]. In contrast to these methods, our method is not restricted by the assumption of a bounded ratio between target and source density. One empirically driven method which is related to ours is [17,43] which aims at balancing the source error and a distance between source and target data representations. However, this method is not theoretically justified as it ignores the minimal combined error of a classifier on representations as defined in [8]. Nevertheless, if the minimal combined error is negligibly small, our results provide a theoretical explanation of the principles underlying the success of [17,43]. Our method relies on bounds on the target error such as e.g. [8,22,19,20,21,34] .
相关工作 遵循方程 (1) 的方法有时被解释为学习领域不变表示。注意,在方程 (1) 中最小化 d(ϕ(x),ϕ(x)) 以实现 ϕ(x)= ϕ(x) 与作为正则化的概念 [40,41] 是不同的。实际上,距离的最小化意味着不合理的过度惩罚,这可能导致性能下降 [19]。有趣的是,我们将其解释为正则化问题,打开了一套强大的数学技术工具箱。我们的方法采用了在正则化病态逆问题中平衡稳定性和近似性的技术。平衡原则源于 [26],专注于非参数回归估计,并在病态问题的背景下引入 [27],以及在使用核的监督学习中 [42]。沿着这条研究路线,我们建议在领域适应的背景下应用平衡原则所依据的数学技术。在无监督领域适应的背景下,最相关的原则性参数选择方法是重要性加权交叉验证 [3] 及其扩展 [24,23]。与这些方法相比,我们的方法不受目标和源密度之间有界比率假设的限制。一种与我们的方法相关的经验驱动方法是 [17,43],旨在平衡源误差和源与目标数据表示之间的距离。然而,由于忽略了分类器在表示上的最小组合误差 [8],该方法在理论上并不成立。尽管如此,如果最小组合误差微不足道,我们的结果为 [17,43] 成功背后的原则提供了理论解释。我们的方法依赖于对目标误差的界限,例如 [8,22,19,20,21,34]

3 Preliminaries
3 初步知识

Sampling error bound Throughout this work, we generically denote an upper bound on the sampling error by ηt,G,δ[0,) ,which is assumed to hold true with probability at least 1δ :
采样误差界限 在本工作中,我们通常用 ηt,G,δ[0,) 表示采样误差的上界,假设其以至少 1δ 的概率成立:
(3)|εT(f,g)ε^T(f,g)|ηt,G,δ
The bound ηt,G,δ depends on the sample size t ,the function class G and the constant δ ,and,it is assumed to satisfy ηt,G,δ0 for t .
界限 ηt,G,δ 依赖于样本大小 t、函数类 G 和常数 δ,并假设满足 ηt,G,δ0 对于 t
Two accompanying target error bounds In the seminal works [8,25] binary classification Y:= {0,1} with 0-1 loss is considered and it is shown that the following bound holds for all symmetric probability at least 1δ :
在开创性的工作中,考虑了二元分类 Y:= {0,1} 及其 0-1 损失,并且显示以下界限对所有对称概率至少成立 1δ
(4)εT(gϕ)εS(gϕ)+λG(ϕ)+d^GG(ϕ(x),ϕ(x))+ηt,G,δ
where
其中
(5)d^GG(x,x):=2(1minf,fG[1si=1s1[f(xi)=f(xi)]+1si=1s1[f(xi)f(xi)]])
is the empirical GG -divergence [44,25] and λG(ϕ):=inffG(εS(fϕ)+εT(fϕ)) is the minimum possible combined error determined by the application of G . Using Eq. (3),the source error can be further upper bounded by the empirical source error. However,the term λG(ϕ) cannot be estimated based on given datasets as it depends on the unknown labeling function lT .
是经验 GG -散度 [44,25],而 λG(ϕ):=inffG(εS(fϕ)+εT(fϕ)) 是通过应用 G 确定的最小可能的综合错误。使用公式 (3),源错误可以进一步由经验源错误上界。然而,项 λG(ϕ) 不能基于给定的数据集进行估计,因为它依赖于未知的标记函数 lT
In [20],the bound in Eq. (4) is generalized to multiple classes Y:={1,,k} and scoring functions H{h:RRk} ,where the output on each dimension indicates the confidence of prediction. For some hH ,let us denote by gh:RY,xargmaxi{1,,k}h(i)(ϕ(x)) with h(i)(z) being the i -th component of h(z) . Let further εT denote the target error based on 0-1 loss. Then,with probability at least 1δ
在 [20] 中,公式 (4) 中的界限被推广到多个类别 Y:={1,,k} 和评分函数 H{h:RRk},其中每个维度的输出表示预测的置信度。对于某些 hH,我们用 gh:RY,xargmaxi{1,,k}h(i)(ϕ(x)) 表示,其中 h(i)(z)h(z) 的第 i 个分量。进一步地,令 εT 表示基于 0-1 损失的目标错误。那么,概率至少为 1δ
(6)εT(ghϕ)εS(ρ)(hϕ)+λH(ϕ)+d^h,H(ρ)(ϕ(x),ϕ(x))+ηs+t,H,δ
where εS(ρ)(h):=XΛρρh(x,lS)dpS(x) with ρh(x,lS):=1/2(h(lS(x))(x)maxylS(x)h(y)(x)) and Λρ(x) being 1x/ρ if 0x<ρ,0 if ρx and 1 otherwise. Further,the target terms T,pT,lT are defined analogously to S,pS,lS ,respectively; the combined error is defined by λH(ϕ):= infhH(εS(ρ)(hϕ)+εT(ρ)(h)) ,and the empirical margin disparity discrepancy (MDD) is given by
其中 εS(ρ)(h):=XΛρρh(x,lS)dpS(x),当 ρh(x,lS):=1/2(h(lS(x))(x)maxylS(x)h(y)(x))Λρ(x)1x/ρ 时,如果 0x<ρ,0ρx,否则为 1。此外,目标项 T,pT,lT 类似于 S,pS,lS 定义;综合错误由 λH(ϕ):= infhH(εS(ρ)(hϕ)+εT(ρ)(h)) 定义,经验边际差异不一致性 (MDD) 由以下公式给出:
(7)d^h,H(ρ)(x,x):=maxhH[1si=1sΛρρh(xi,gh)1ti=1tΛρρh(xi,gh)]
Similarly to Eq. (4), the source error can be further estimated by an empirical error and the combined error λH(ϕ) cannot be estimated based on given data.
类似于方程 (4),源误差可以通过经验误差进一步估计,而组合误差 λH(ϕ) 不能基于给定数据进行估计。
Balancing principle for regularized inverse problems Let H and K be two Hilbert spaces and is to find some function f satisfying Vf=g ,see e.g. [45,46,40,41] and references therein. In general, the above problem is ill-posed, i.e. a solution does either not exist, is not unique or does not depend continuously on g . Existence and uniqueness can be approached by using the the following minimizer as approximation of f
正则化逆问题的平衡原则 设 HK 为两个希尔伯特空间,目标是找到某个函数 f,使其满足 Vf=g,参见例如 [45,46,40,41] 及其中的参考文献。一般而言,上述问题是病态的,即解要么不存在,要么不唯一,或者不连续依赖于 g。可以通过使用以下最小化器作为 f 的近似来接近存在性和唯一性。
fHargminfHVfgK2
However,especially in the case of a noisy operator V^ and noisy data g^ ,continuous dependency on data becomes an important issue which can be restored using Tikhonov regularization [45]
然而,特别是在噪声算子 V^ 和噪声数据 g^ 的情况下,对数据的连续依赖性成为一个重要问题,可以通过 Tikhonov 正则化来恢复 [45]。
(8)fαargminfHV^fg^K2+αfH2
In many cases, probabilistic bounds on the error can be proven
在许多情况下,可以证明误差的概率界限。
(9)fαfHHS(α)+A(α)
where S(α) is called sampling error made by considering noisy approximations V^ and g^ of V and g ,respectively,and, A(α) is called approximation error originating from adding the regularizer fH2 . Commonly, S(α) decreases while A(α) increases with increasing α . We refer to [42] for detailed examples. The fact that S(α) decreases and A(α) increases motivates the so-called balancing principle which aims at computing α such that S(α)=A(α)[26,27,42,41,28] . As a result,the balancing principle provides a procedure for approximating α without having access to the values of A(α) . The balancing principle obtains optimal error rates in many settings [47,48,49,42,41,28].
其中 S(α) 被称为考虑到噪声近似 V^g^ 的采样误差,分别对应于 Vg,而 A(α) 被称为由于添加正则化项 fH2 而产生的近似误差。通常,随着 α 的增加,S(α) 会减少,而 A(α) 会增加。我们参考 [42] 以获取详细示例。S(α) 减少和 A(α) 增加的事实激励了所谓的平衡原则,旨在计算 α 使得 S(α)=A(α)[26,27,42,41,28]。因此,平衡原则提供了一种在没有访问 A(α) 值的情况下近似 α 的程序。平衡原则在许多设置中获得了最佳误差率 [47,48,49,42,41,28]。

4 Balancing principle for domain adaptation
4 领域适应的平衡原则

In the following, we present the mathematical foundations of Algorithm 1 in two steps. In a first step, in Subsection 4.1, we state our assumptions and detail our idea of choosing the value of balance between terms in a target error bound. In a second step, in Subsection 4.2, we propose an algorithmic criterion, the balancing principle estimate for approximating the theoretical choice of the balancing value without target labels. Based on this criterion, we explain why we expect Algorithm 1 to be accurate.
在接下来的内容中,我们将算法 1 的数学基础分为两个步骤进行介绍。在第一步中,在小节 4.1 中,我们陈述我们的假设,并详细说明我们选择目标误差界限中各项之间平衡值的想法。在第二步中,在小节 4.2 中,我们提出了一种算法标准,即平衡原则估计,用于在没有目标标签的情况下近似理论上平衡值的选择。基于这一标准,我们解释了为什么我们期望算法 1 是准确的。

4.1 Balancing terms in a target error bound
4.1 目标误差界限中的平衡项

Assumptions In the following,let gαϕα be a minimizer of Eq. (1) and assume that a target error bound holds which is of the form given by Eq. (2). Our approach is based on the plausible assumptions (a) that the function αE(α) is continuous and bounded by some constant B>0 , and (b) that the function αD(α) is continuous,non-increasing and non-degenerate,i.e. D(0)>0 . We further make the technical assumption (c) that limαD(α)/D(0)<supα[0,)E(α)/B . Note that (c) is satisfied in most standard cases,where D(α)0 for α,t . For example consider the two accompanying target error bounds in Section 3 as discussed at the end of this section.
假设 在接下来的内容中,设 gαϕα 为方程 (1) 的最小化器,并假设存在一个形式如方程 (2) 所示的目标误差界限。我们的方法基于以下合理假设:(a) 函数 αE(α) 是连续的,并且被某个常数 B>0 有界;(b) 函数 αD(α) 是连续的、非递增的且非退化的,即 D(0)>0。我们进一步做出技术假设 (c),即 limαD(α)/D(0)<supα[0,)E(α)/B。请注意,(c) 在大多数标准情况下是满足的,其中 D(α)0 对于 α,t。例如,考虑本节末尾讨论的第 3 节中的两个伴随目标误差界限。
Bridging regularized inverse problems and domain adaptation Under the assumptions above, our domain adaptation learning setup in Section 2 shares characteristics with the setting of regularized inverse problems as described in Section 3. Indeed,the Tikhonov regularizer fH2 in Eq. (8) is similarly applied as the distance-regularizer d(ϕ(x),ϕ(x)) in Eq. (1). In addition,error bounds for inverse problems as given by Eq. (9) show a similar form as target error bounds following Eq. (2). The sampling error S(α) in Eq. (9) decreases similarly to the domain distance in Eq. (2) and the approximation error A(α) in Eq. (9) cannot be estimated similarly to the learning errors E(α) in Eq. (2). However,in the domain adaptation setting,the term E(α) does not necessarily increase. We approach this issue by considering the least non-decreasing majorant of E(α) .
桥接正则化逆问题与领域适应 在上述假设下,我们在第2节中的领域适应学习设置与第3节中描述的正则化逆问题的设置具有相似特征。实际上,方程(8)中的Tikhonov正则化器 fH2 与方程(1)中的距离正则化器 d(ϕ(x),ϕ(x)) 的应用方式相似。此外,方程(9)给出的逆问题的误差界限与方程(2)后的目标误差界限形式相似。方程(9)中的采样误差 S(α) 以与方程(2)中的领域距离相似的方式减少,而方程(9)中的近似误差 A(α) 不能以与方程(2)中的学习误差 E(α) 相似的方式进行估计。然而,在领域适应设置中,项 E(α) 不一定会增加。我们通过考虑 E(α) 的最小非递减上界来解决这个问题。
Definition 1 (Least non-decreasing majorant [50]). The least non-decreasing majorant of E(α) is given by E¯(α):=supβ[0,α]E(β) .
定义1(最小非递减上界 [50])。 E(α) 的最小非递减上界由 E¯(α):=supβ[0,α]E(β) 给出。
Further upper bounding Eq. (2) by
进一步通过
(10)εT(gαϕα)D(α)+E(α)D(α)+E¯(α)
results in a form which satisfies all properties needed to apply the balancing principle, see Table 1.
对方程(2)的上界进行处理,得到满足应用平衡原则所需的所有性质的形式,见表1。
Table 1: Correspondences between regularized inverse problems and domain adaptation which allow to apply the balancing principle. See Section 3 for details on the inverse problem setting.
表1:正则化逆问题与领域适应之间的对应关系,允许应用平衡原则。有关逆问题设置的详细信息,请参见第3节。
Tikhonov-regularized inverse problemDistance-regularized domain adaptation
fαargminV^fg^K2+αfH2 fHgαϕαargminε^S(gϕ)+αd(ϕ(x),ϕ(x)) gG,ϕΦ
fαfHHS(α)+A(α)εT(gαϕα)D(α)+E(α)
decreasing sampling error S(α)decreasing domain distance D(α)
increasing approximation error A(α)bounded learning errors E(α)
A(α) not estimableE(α) not estimable
balance A(α)=S(α)balance D(α)D(0)=E¯(α)B
Tikhonov 正则化逆问题距离正则化领域适应
fαargminV^fg^K2+αfH2 fHgαϕαargminε^S(gϕ)+αd(ϕ(x),ϕ(x)) gG,ϕΦ
fαfHHS(α)+A(α)εT(gαϕα)D(α)+E(α)
降低采样误差 S(α)降低领域距离 D(α)
增加近似误差 A(α)有界学习误差 E(α)
A(α) 不可估计E(α) 不可估计
平衡 A(α)=S(α)平衡 D(α)D(0)=E¯(α)B
Balancing value for domain adaptation Having identified the shared characteristics between regularized inverse problems and domain adaptation, we now apply the techniques underlying the balancing principle to domain adaptation. We define the balancing value α as achieving
领域适应的平衡值 在识别正则化逆问题与领域适应之间的共同特征后,我们现在将平衡原则的基本技术应用于领域适应。我们将平衡值 α 定义为实现
(11)D(α)D(0)=E¯(α)B
The normalizing factor 1/D(0) on the left-hand side of Eq. (11) and the factor 1/B ensure that the two curves D(α)/D(0) and E¯(α)/B intersect. In particular,the existence of α follows from the
方程(11)左侧的归一化因子 1/D(0) 和因子 1/B 确保两条曲线 D(α)/D(0)E¯(α)/B 相交。特别地, α 的存在源于
Figure 2: Left: The BPDA in Algorithm 1 overcomes the problem of the unknown learning errors term E(α) by approximating α which balances D(α)/D(0) (green) and the least non-decreasing majorant E¯(α)/B (red dashed) of E(α)/B (red). Right: Average and standard deviation over 10 repetitions of estimated learning errors E and the domain distance D of the accompanying target error bound Eq. (4) for the models computed by Eq. (1) with the Maximum Mean Discrepancy [51] as distance. The BPDA chooses the value αBP=102 near the estimated balancing value.
图 2:左:算法 1 中的 BPDA 通过近似 α 来克服未知学习误差项 E(α) 的问题,该近似平衡了 D(α)/D(0)(绿色)和 E(α)/B(红色)的最小非递减主导函数 E¯(α)/B(红色虚线)。右:在 10 次重复中估计的学习误差 E 和伴随目标误差界限 Eq. (4) 的领域距离 D 的平均值和标准差,对于通过 Eq. (1) 计算的模型,使用最大均值差异 [51] 作为距离。BPDA 选择接近估计平衡值的值 αBP=102
assumptions (a)-(c) above. See Figure 2 for an illustration. Algorithm 1 approximates α . If E(α) is non-decreasing and the bound in Eq. (10) holds with equality, then the rate of the target error εT(gαϕα) is optimal,i.e. εT(gαϕα) is only a constant factor away from the optimum infα[0,)εT(gαϕα) . See the supplementary material for a proof. This optimality property is shared with related regularization settings [47,48,49,42,28].
上述假设 (a)-(c)。请参见图 2 以获取说明。算法 1 近似 α。如果 E(α) 是非递减的,并且 Eq. (10) 中的界限成立,则目标误差 εT(gαϕα) 的速率是最优的,即 εT(gαϕα) 仅与最优值 infα[0,)εT(gαϕα) 相差一个常数因子。有关证明,请参见补充材料。该最优性属性与相关的正则化设置 [47,48,49,42,28] 共享。
Two accompanying target error bounds Let us now discuss the reasoning above based on the two error bounds described in Section 3. First, recall the target error bound of [25] in Eq. (4). If we take D(α):=d^GG(ϕα(x),ϕα(x))+ηt.G.δ and E(α):=εS(gαϕα)+λG(ϕα) ,it is natural to assume D(α) to decrease with α ,especially for adversarial approaches which penalize the empirical GG -divergence d^GG(ϕ(x),ϕ(x)) ,see [16]. It also holds that E(α)3 for all α[0,) . For the balancing value to exist for large sample size t ,we need to verify that D(α)/D(0)0 for t,α . However,this is satisfied for most function classes Φ ,since it can be assumed that the constant function ϕ:xcR which achieves d^GG(ϕ(x),ϕ(x))=0 is contained in Φ . That is, d^GG(ϕ(x),ϕ(x))=0 can be achieved for α and consequently D(α)/D(0)0 for t,α . Consider now the bound in Eq. (6). By using D(α):=d^h,H(ρ)(ϕα(x),ϕα(x))+ηs+t,H,δ,E(α):= εS(ρ)(hϕ)+λH(ϕ) and B=3 ,all assumptions above are naturally satisfied similarly to the target error bound in Eq. (4) and the balancing value exists. In Section 5, we empirically investigate the performance of our method based on our two accompanying target error bounds. However, let us first show how we can overcome the problem of the non-computable term E(α) .
两个伴随的目标误差界限 现在让我们讨论基于第3节中描述的两个误差界限的推理。首先,回顾[25]中在公式(4)中的目标误差界限。如果我们取D(α):=d^GG(ϕα(x),ϕα(x))+ηt.G.δE(α):=εS(gαϕα)+λG(ϕα),那么自然假设D(α)会随着α的增加而减少,特别是对于惩罚经验GG-散度d^GG(ϕ(x),ϕ(x))的对抗性方法,见[16]。对于所有α[0,),也成立E(α)3。为了在大样本量t下存在平衡值,我们需要验证D(α)/D(0)0对于t,α。然而,对于大多数函数类Φ,这一点是满足的,因为可以假设常数函数ϕ:xcR能够达到d^GG(ϕ(x),ϕ(x))=0,并且包含在Φ中。也就是说,d^GG(ϕ(x),ϕ(x))=0可以通过α实现,因此D(α)/D(0)0对于t,α也是成立的。现在考虑公式(6)中的界限。通过使用D(α):=d^h,H(ρ)(ϕα(x),ϕα(x))+ηs+t,H,δ,E(α):=εS(ρ)(hϕ)+λH(ϕ)B=3,上述所有假设自然满足,类似于公式(4)中的目标误差界限,并且存在平衡值。在第5节中,我们基于我们的两个伴随目标误差界限实证研究我们方法的性能。然而,让我们首先展示如何克服不可计算项E(α)的问题。

4.2 Approximation of balancing value
4.2 平衡值的近似

Unfortunately, E(α) in Eq. (1) usually contains information about the target labeling function lT and α can therefore not be calculated directly.
不幸的是,公式(1)中的E(α)通常包含关于目标标记函数lT的信息,因此α无法直接计算。
Balancing principle estimate We propose the following criterion for estimating α which is realized by the BPDA in Algorithm 1.
平衡原则估计 我们提出以下标准来估计α,该标准通过算法1中的BPDA实现。
Definition 2 (Balancing principle estimate). Let α1,,αw,wN with α1=0 be an increasing sequence of values in [0,) and denote by fi:=gαiϕαi . Then,the balancing principle estimate is
定义 2(平衡原则估计)。设 α1,,αw,wNα1=0[0,) 中的一个递增值序列,并用 fi:=gαiϕαi 表示。则平衡原则估计为
(12)αBP:=max{αij{1,,i1}:ε^T(fi,fj)D(αj)(2+2BD(0))+ηt,G,δ}
The balancing principle estimate in Definition 2 is based on checking an upper bound on the empirical cross-error ε^T(fi,fj) between two models fi,fj resulting from two different values αi,αj of the regularization parameter, respectively. The empirical cross-error does not contain information about the unknown target labels lT(x1),,lT(xt) and can be computed based on available data. The Eq. (11) can be explained as follows.
定义 2 中的平衡原则估计基于对两个模型 fi,fj 之间的经验交叉误差 ε^T(fi,fj) 的上界进行检查,这两个模型分别来自正则化参数的两个不同值 αi,αj。经验交叉误差不包含关于未知目标标签 lT(x1),,lT(xt) 的信息,并且可以基于可用数据进行计算。方程 (11) 可以解释如下。
Lemma 1. Let δ(0,1),α,β[0,) and denote by fα:=gαϕα . If 0αβα then the following holds with probability at least 1δ :
引理 1。设 δ(0,1),α,β[0,) 并用 fα:=gαϕα 表示。如果 0αβα,则以下不等式以至少 1δ 的概率成立:
(13)ε^T(fα,fβ)D(α)(2+2BD(0))+ηt,G,δ
Lemma 1 (see the supplementary material for its proof) shows that the inequality in Eq. (12) is satisfied if αjαiα . This implies that αi>α if the criterion is violated for some αjαi . Consequently,the maximum αBP as defined in Eq. (12) from an increasing sequence α1,,αw which violates Eq. (13) for some j{1,,i1} can be assumed to be near to α .
引理 1(其证明见补充材料)表明,如果 αjαiα,则方程 (12) 中的不等式成立。这意味着如果某些 αjαi 的标准被违反,则 αi>α。因此,从一个递增序列 α1,,αw 中定义的最大值 αBP 如果对某些 j{1,,i1} 违反了方程 (13),可以假设接近于 α
Generalization guarantee The model gαBPϕαBP obtained by the BPDA in Algorithm 1 satisfies the following generalization guarantee.
泛化保证 通过算法 1 中的 BPDA 获得的模型 gαBPϕαBP 满足以下泛化保证。
Theorem 1. Let δ(0,1) and α1,,αw[0,),α1=0 be an increasing sequence such that
定理 1。设 δ(0,1)α1,,αw[0,),α1=0 为一个递增序列,使得
(14)D(αl)qD(αl+1)
for all l{1,,w1} and some q>1 . Then,with probability at least 1δ
对所有 l{1,,w1} 和某些 q>1,则以至少 1δ 的概率成立。
(15)εT(gαBPϕαBP)D(α)(3+3BD(0))q+ηt,G,δ
Theorem 1 shows that the target error of the model gαBPϕαBP identified by the BPDA has the same error rate as D(α) for t . Moreover,if the optimum infα[0,)D(α)+E¯(α) is achieved,then the error rate is optimal in the sense that εT(gαBPϕαBP) is only a constant factor worse than the minimum infα[0,)D(α)+E¯(α) . The constant factor is given by (3+3B/D(0))qmax{D(0)/B,1} . That is,the bound is larger for steeper D between two consecutive values for α ( q is larger),and,it is the supplementary material. If E(α) is increasing and the instantiation bound in Eq. (2) is tight,i.e. it holds with equality,then the error εT(gαBPϕαBP) is only a constant factor worse than the optimal error infα[0,)εT(gαϕα) .
定理 1 显示,BPDA 确定的模型的目标误差 gαBPϕαBPD(α) 的误差率相同,适用于 t。此外,如果达到最优 infα[0,)D(α)+E¯(α),则误差率在某种意义上是最优的,因为 εT(gαBPϕαBP) 仅比最小的 infα[0,)D(α)+E¯(α) 差一个常数因子。这个常数因子由 (3+3B/D(0))qmax{D(0)/B,1} 给出。也就是说,对于两个连续值之间的陡峭 D,界限更大(q 更大),并且这是补充材料。如果 E(α) 是递增的,并且公式 (2) 中的实例化界限是紧的,即它成立于等式,那么误差 εT(gαBPϕαBP) 仅比最优误差 infα[0,)εT(gαϕα) 差一个常数因子。
It is well known that appropriate assumptions as given above are needed for successful domain adaptation [52,19] . However,in practice,target error bounds are often not tight and optimality cannot be guaranteed. In the following section, we therefore investigate the performance of our method on benchmark datasets based on our two accompanying target error bounds.
众所周知,上述适当假设对于成功的领域适应 [52,19] 是必要的。然而,在实践中,目标误差界限往往不紧,无法保证最优性。因此,在接下来的部分中,我们将基于我们的两个伴随目标误差界限,调查我们方法在基准数据集上的表现。

5 Empirical evaluations
5 实证评估

We empirically investigate the performance of our approach based on two target error bounds, two parameter selection methods,three datasets and different domain adaptation methods 3 .
我们基于两个目标误差界限、两种参数选择方法、三个数据集和不同的领域适应方法 3 实证调查我们方法的性能。

5.1 Setup
5.1 设置

Given a domain adaptation algorithm that follows Eq. (1), the goal is to identify the regularization parameter α from the sequence 0,106,105,,103,104 which leads to the smallest target error of the model learned by the algorithm.
给定一个遵循公式 (1) 的领域适应算法,目标是从序列 0,106,105,,103,104 中识别正则化参数 α,以获得算法学习的模型的最小目标误差。
Datasets We rely on one academic example which we call Transformed Moons. Transformed Moons shows a density ratio that is unbounded in large regions, see Figure 1. We also use the Amazon Reviews dataset [53]. This dataset contains text reviews from four domains: books (B), DVDs (D), electronics (E), and kitchen appliances (K). Reviews are encoded in 5000 dimensional feature vectors of bag-of-words unigrams and bigrams with binary labels: label 0 if the product is ranked by 1 to 3 stars, and label 1 if the product is ranked by 4 or 5 stars. From the four categories we obtain twelve domain adaptation tasks where each category serves once as source domain and once as target domain. examples and 2000 unlabeled target examples for training, and between 3000 and 6000 examples for testing. Our third dataset is the DomainNet-2019 dataset consisting of six different domains, each having 345 classes, and an average count of 288 images per class, i.e. around 0.6 million images [56]. However, our focus is not on large-scale domain adaptation, but rather on ranking model selection methods; hence, we propose a reduced version of the DomainNet-2019 dataset. In particular, we reduce the number of classes to five. We call our new dataset MiniDomainNet. See the supplementary material for the dataset statistics.
数据集 我们依赖于一个学术示例,称为变换月亮(Transformed Moons)。变换月亮在大区域内显示出无界的密度比,见图1。我们还使用亚马逊评论数据集 [53]。该数据集包含来自四个领域的文本评论:书籍(B)、DVD(D)、电子产品(E)和厨房电器(K)。评论被编码为5000维特征向量,使用词袋模型的单词和双词,带有二进制标签:如果产品的评级为1到3星,则标签为0;如果产品的评级为4或5星,则标签为1。我们从这四个类别中获得十二个领域适应任务,每个类别作为源领域和目标领域各服务一次。用于训练的示例和2000个未标记的目标示例,以及用于测试的3000到6000个示例。我们的第三个数据集是DomainNet-2019数据集,由六个不同的领域组成,每个领域有345个类别,每个类别平均有288张图像,即大约60万张图像 [56]。然而,我们的重点不在于大规模领域适应,而是在于排名模型选择方法;因此,我们提出了DomainNet-2019数据集的简化版本。特别是,我们将类别数量减少到五个。我们将新的数据集称为MiniDomainNet。有关数据集统计信息,请参见补充材料。

3 The source-code can be found at https://github.com/Xpitfire/bpda
3 源代码可以在 https://github.com/Xpitfire/bpda 找到

Balancing principle for domain adaptation We apply the BPDA in Algorithm 1 based on the two accompanying target error bounds described in Section 3. On the two-class datasets Transformed Moons and Amazon Reviews, we apply the BPDA based on Eq. (4). We set the domain distance D(α):=d^GG(ϕα(x),ϕα(x))+ηt,G,δ and approximate its value by a classifier as proposed in [16, Subsection 3.2]. In our experiments on the multi-class dataset MiniDomainNet, we apply the BPDA based on Eq. (6). We define D(α):=d^h,H(ρ)(ϕα(x),ϕα(x))+ηs+t,H,δ and approximate its value as proposed in [20, Subsection 4.2]. In both variants of the BPDA, one for each bound, we follow the argument of [25, Subsection 7.2] to have enough unlabeled data to evaluate the bounds without considering the finite sample error term ηt,G,δ . We repeat each domain adaptation model training several times. The if-statement in the BPDA in Algorithm 1 is considered violated, if there is a violation of the statement for at least one of the repetitions. For a fair comparison, the evaluations of IWV and DEV are also based on all repetitions. More precisely, for IWV and DEV, we choose the parameter with the lowest average importance weighted risk and lowest average DEV-risk, respectively, where the average is computed over all repetitions.
域适应的平衡原则 我们在算法 1 中应用 BPDA,基于第 3 节中描述的两个伴随目标误差界限。在两个类数据集 Transformed Moons 和 Amazon Reviews 上,我们基于公式 (4) 应用 BPDA。我们设定域距离 D(α):=d^GG(ϕα(x),ϕα(x))+ηt,G,δ 并通过分类器近似其值,如 [16, 子节 3.2] 中所提议。在我们对多类数据集 MiniDomainNet 的实验中,我们基于公式 (6) 应用 BPDA。我们定义 D(α):=d^h,H(ρ)(ϕα(x),ϕα(x))+ηs+t,H,δ 并根据 [20, 子节 4.2] 中的提议近似其值。在 BPDA 的两个变体中,每个界限一个,我们遵循 [25, 子节 7.2] 的论点,以确保有足够的未标记数据来评估界限,而不考虑有限样本误差项 ηt,G,δ。我们对每个域适应模型的训练进行了多次重复。如果在 BPDA 的算法 1 中的 if 语句对于至少一次重复被违反,则认为该语句被违反。为了公平比较,IWV 和 DEV 的评估也基于所有重复。更准确地说,对于 IWV 和 DEV,我们选择具有最低平均重要性加权风险和最低平均 DEV 风险的参数,其中平均值是基于所有重复计算的。
Parameter choice baselines We compare our parameter choice approach to four baselines. The first baseline is training on source data only (SO). The second baseline is the best target error (TB) and it serves as a lower bound for the error. The third baseline is importance weighted validation [3] (IWV). We follow [23] and use held-out validation, i.e. we hold out a part of the training data as validation set, and we compute the importance weights based on this validation set. We also follow [57] and [23, Subsection 4.3] to estimate the importance weight by a classifier trained to separate source from target data. The classifier is tuned separately for each task and dataset such that its validation misclassification error is at most 0.05 . For MiniDomainNet, we compute the importance weight based on the features of the pre-trained ResNet-18 [58]. The fourth baseline is deep embedded validation [23] (DEV) which solves the unbounded variance issue in IWV. Following [23], DEV is applied on the features of the neural networks.
参数选择基准 我们将我们的参数选择方法与四个基准进行比较。第一个基准仅在源数据上进行训练(SO)。第二个基准是最佳目标误差(TB),它作为误差的下限。第三个基准是重要性加权验证 [3](IWV)。我们遵循 [23] 的方法,使用保留验证,即我们将一部分训练数据保留作为验证集,并基于该验证集计算重要性权重。我们还遵循 [57] 和 [23, 子节 4.3] 的方法,通过一个训练来区分源数据和目标数据的分类器来估计重要性权重。该分类器为每个任务和数据集单独调整,以确保其验证错误率最多为 0.05。对于 MiniDomainNet,我们基于预训练的 ResNet-18 [58] 的特征计算重要性权重。第四个基准是深度嵌入验证 [23](DEV),它解决了 IWV 中的无界方差问题。遵循 [23],DEV 应用于神经网络的特征上。
Domain adaptation methods In our experiments, we use three domain adaptation techniques. Domain-adversarial neural networks [16] (DANN), Maximum Mean Discrepancy [59, 15] (MMD) and Central Moment Discrepancy [18] (CMD). The details of all neural network architectures used, as well as the training strategy and hyperparameters are provided in the supplementary material.
领域适应方法 在我们的实验中,我们使用了三种领域适应技术。领域对抗神经网络 [16](DANN)、最大均值差异 [59, 15](MMD)和中心矩差异 [18](CMD)。所有使用的神经网络架构的详细信息,以及训练策略和超参数,均在补充材料中提供。
Assumptions To evaluate the compliance of the assumptions in Subsection 4.1 for the algorithms CMD and MMD,we estimate E ,the domain distance D and the least non-decreasing majorant E¯ on the Transformed Moons dataset. In particular,the term E is computed using the (in unsupervised combined error in Eq. (4). We repeat the evaluations 10 times for different initializations of the domain adaptation model weights.
假设 为了评估算法 CMD 和 MMD 在子节 4.1 中假设的合规性,我们在转化的 Moons 数据集上估计 E、领域距离 D 和最小非递减上界 E¯。特别地,项 E 是使用(在无监督组合误差中)计算的,见公式 (4)。我们对领域适应模型权重的不同初始化重复评估 10 次。

5.2 Results
5.2 结果

Assumptions The terms E and D for MMD on Transformed Moons are shown in Figure 2 and, for CMD, in the supplementary material. The following observation (a)-(d) can be made: (a) E is bounded,the mean curves of D(α) tend to be non-increasing and the technical assumption D(α)/D(0)<supα[0,)E(α)/B is satisfied for α103 . That is,the mean curves tend to follow the assumptions listed in Section 4.1. (b) The mean curves E¯ and E tend to be similar. (c) The average parameters chosen by the balancing principle (see αBP in Figure 2) are the maximum values for which the mean curves of E¯(α)/B are smaller than the mean curves of D(α)/D(0) . That is,the BPDA described in Algorithm 1 tends to perform as expected. (d) There is a moderate trend towards a figure). However, the BPDA is (on average) robust w.r.t. this violation as it picks nearly the optimal value (see Table 2).
假设 MMD 在转化的 Moons 上的项 ED 如图 2 所示,而 CMD 的项则在补充材料中展示。可以做出以下观察 (a)-(d):(a) E 是有界的,D(α) 的均值曲线趋向于非递减,并且技术假设 D(α)/D(0)<supα[0,)E(α)/Bα103 是满足的。也就是说,均值曲线趋向于遵循第 4.1 节中列出的假设。(b) 均值曲线 E¯E 趋向于相似。(c) 平衡原则选择的平均参数(见图 2 中的 αBP)是均值曲线 E¯(α)/B 小于均值曲线 D(α)/D(0) 的最大值。也就是说,算法 1 中描述的 BPDA 趋向于按预期执行。(d) 存在向某个图形的适度趋势。然而,BPDA 在此违反方面(平均而言)是稳健的,因为它选择了几乎最优的值(见表 2)。
Comparative study Average values of our empirical evaluations on Transformed Moons and Amazon Review are summarized in Table 2. For the full tables on Transformed Moons, Amazon Reviews and MiniDomainNet, see the supplementary material. Although making no assumptions on the ratio between target and source density, the BPDA outperforms related parameter choice methods (IWV, DEV) on almost all tasks on Transformed Moons and Amazon Reviews, and, obtains competitive results on MiniDomainNet.
比较研究 我们在变换月球和亚马逊评论上的实证评估的平均值总结在表2中。有关变换月球、亚马逊评论和MiniDomainNet的完整表格,请参见补充材料。尽管对目标和源密度之间的比率没有假设,BPDA在变换月球和亚马逊评论的几乎所有任务上都优于相关的参数选择方法(IWV,DEV),并在MiniDomainNet上获得了具有竞争力的结果。
Table 2: Average target misclassification error with best values in bold. See the supplementary
表2:平均目标误分类错误,最佳值用粗体显示。请参见补充材料。
material for the full results for all datasets and domain adaptation tasks.
所有数据集和领域适应任务的完整结果的材料。
MethodTransformed MoonsAmazon Reviews
soIWVDEVBPDATBSOIWVDEVBPDATB
MMD0.210.200.340.160.160.270.250.250.220.21
DANN0.180.180.170.120.120.280.280.320.270.24
CMD0.210.200.190.190.180.280.240.250.210.21
Avg.0.200.190.230.160.150.280.260.270.230.22
方法变换的月球亚马逊评论
所以IWVDEVBPDATBSOIWVDEVBPDATB
MMD0.210.200.340.160.160.270.250.250.220.21
DANN0.180.180.170.120.120.280.280.320.270.24
CMD0.210.200.190.190.180.280.240.250.210.21
平均值0.200.190.230.160.150.280.260.270.230.22

6 Conclusion and future work
6 结论与未来工作

One widely-used technique for unsupervised domain adaptation is to map the data into a new feature space where the source and target data representations appear similar, and where enough information is preserved for prediction [14]. The similarity is often realized by minimizing the source error weighted by a distance measure between source and target representations. One common interpretation of this approach is to learn domain-invariant representations. However, there is a subtle difference between distance-regularization by penalizing the error minimization with a distance as above, and distance-minimization which results in domain-invariant representations. The latter can even lead to deteriorated performance [19]. In contrast, the interpretation as regularization problem opens up a powerful toolbox of mathematical techniques. We take up the technique of balancing stability and approximation in the regularization of ill-posed inverse problems, to tackle the problem of choosing the crucial regularization parameter in distance-regularized domain adaptation. Our approach overcomes the lack of target labels, it satisfies a generalization guarantee, and is (to the best of our knowledge) the first theoretically justified method that allows source and target distributions with disjoint supports. Finally, our approach outperforms or is on par, with the state of the art on the problem of choosing the regularization parameter, on several domain adaptation methods; applied on different datasets.
一种广泛使用的无监督领域适应技术是将数据映射到一个新的特征空间,在该空间中,源数据和目标数据的表示看起来相似,并且保留了足够的信息以进行预测 [14]。相似性通常通过最小化源误差并加权源和目标表示之间的距离度量来实现。这种方法的一个常见解释是学习领域不变的表示。然而,通过用距离惩罚误差最小化的距离正则化与导致领域不变表示的距离最小化之间存在微妙的区别。后者甚至可能导致性能下降 [19]。相比之下,将其解释为正则化问题打开了一套强大的数学技术工具箱。我们采用在不适定逆问题的正则化中平衡稳定性和近似的技术,以解决在距离正则化领域适应中选择关键正则化参数的问题。我们的方法克服了目标标签的缺乏,满足了一般化保证,并且(据我们所知)是首个理论上有依据的方法,允许源和目标分布具有不相交的支持。最后,我们的方法在选择正则化参数的问题上,在几种领域适应方法中超越或与最先进的技术相当;并应用于不同的数据集。

Broader impact
更广泛的影响

Many fields, such as manufacturing, personalized medicine or analytical chemistry, have to handle problems of domain shift together with issues of data limitations. These areas can profit from our research, as it provides a principled way of choosing a justified regularization parameter of unsupervised domain adaptation algorithms. Our method provides a high level of trust by applying mathematical techniques with guarantees originally developed in the area of ill-posed inverse problems. However, in domain adaptation in general, one critical point is the bias stored in the source data (e.g. past product, previous patient data, previous spectroscope), since knowledge in this data is used to improve the learning on the unlabeled target data (e.g. new product, new patient data, new spectroscope). If such a bias is present in one of the source domains, the predictions in new target domains might always suffer from this bias. We therefore suggest to not only look at new data in new target domains but more importantly consider dataset bias in already collected labeled source data.
许多领域,如制造业、个性化医学或分析化学,必须处理领域转移问题以及数据限制问题。这些领域可以从我们的研究中受益,因为它提供了一种选择无监督领域适应算法中合理正则化参数的原则性方法。我们的方法通过应用最初在病态逆问题领域开发的具有保证的数学技术,提供了高度的信任。然而,在领域适应中,一个关键点是存储在源数据中的偏差(例如,过去的产品、以前的患者数据、以前的光谱仪),因为这些数据中的知识用于改善对未标记目标数据(例如,新产品、新患者数据、新光谱仪)的学习。如果这种偏差存在于某个源领域中,那么在新的目标领域中的预测可能总是受到这种偏差的影响。因此,我们建议不仅要关注新目标领域中的新数据,更重要的是考虑已经收集的标记源数据中的数据集偏差。

Acknowledgments and Disclosure of Funding
致谢与资金披露

We thank Markus Holzleitner for careful proofreading. We thank the anonymous reviewers for a constructive discussion which formed the basis for the supplementary material's section about limitations, risks and future developments. The research reported in this paper has been funded by the Federal Ministry for Climate Action, Environment, Energy, Mobility, Innovation and Technology (BMK), the Federal Ministry for Digital and Economic Affairs (BMDW), and the Province of Upper Austria in the frame of the COMET-Competence Centers for Excellent Technologies Programme and the COMET Module S3AI managed by the Austrian Research Promotion Agency FFG. The LIT AI Lab is financed by the Federal State of Upper Austria. We further acknowledge the partial support of the FFG project AutoQual-I. The ELLIS Unit Linz, the LIT AI Lab and the Institute for Machine Learning, are the projects AI-MOTION (LIT-2018-6-YOU-212), DeepToxGen (LIT-2017-3-YOU-003), AI-SNN (LIT-2018-6-YOU-214), DeepFlood (LIT-2019-8-YOU-213), Medical Cognitive Computing Center (MC3), INCONTROL-RL (FFG-881064), PRIMAL (FFG-873979), DL for GranularFlow (FFG- 871302), AIRI FG 9-N (FWF-36284, FWF-36235), ELISE (H2020-ICT-2019-3 ID: 951847), AIDD (MSCA-ITN-2020 ID: 956832). We thank Janssen Pharmaceutica (MaDeSMart, HBC.2018.2287), Audi.JKU Deep Learning Center, TGW LOGISTICS GROUP GMBH, Silicon Austria Labs (SAL), FILL Gesellschaft mbH, Anyline GmbH, Google, ZF Friedrichshafen AG, Robert Bosch GmbH, UCB Biopharma SRL, Merck Healthcare KGaA, Verbund AG, TÜV Austria, and the NVIDIA program with the award GCP19980904.
我们感谢Markus Holzleitner的仔细校对。我们感谢匿名评审者的建设性讨论,这为关于局限性、风险和未来发展的补充材料部分奠定了基础。本文所报告的研究得到了联邦气候行动、环境、能源、交通、创新和技术部(BMK)、联邦数字和经济事务部(BMDW)以及上奥地利省在COMET卓越技术中心项目框架下的资助,以及由奥地利研究促进机构FFG管理的COMET模块S3AI。LIT AI实验室由上奥地利联邦州资助。我们还感谢FFG项目AutoQual-I的部分支持。ELLIS单位林茨、LIT AI实验室和机器学习研究所的项目包括AI-MOTION(LIT-2018-6-YOU-212)、DeepToxGen(LIT-2017-3-YOU-003)、AI-SNN(LIT-2018-6-YOU-214)、DeepFlood(LIT-2019-8-YOU-213)、医学认知计算中心(MC3)、INCONTROL-RL(FFG-881064)、PRIMAL(FFG-873979)、DL for GranularFlow(FFG-871302)、AIRI FG 9-N(FWF-36284,FWF-36235)、ELISE(H2020-ICT-2019-3 ID: 951847)、AIDD(MSCA-ITN-2020 ID: 956832)。我们感谢Janssen Pharmaceutica(MaDeSMart,HBC.2018.2287)、Audi.JKU深度学习中心、TGW LOGISTICS GROUP GMBH、Silicon Austria Labs(SAL)、FILL Gesellschaft mbH、Anyline GmbH、谷歌、ZF Friedrichshafen AG、罗伯特·博世有限公司、UCB Biopharma SRL、默克医疗保健KGaA、Verbund AG、奥地利TÜV以及获得奖项GCP19980904的NVIDIA项目。

References
参考文献

[1] Sinno J Pan and Qiang Yang. A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineeringn, 22(10):1345-1359, 2010.
[2] Hidetoshi Shimodaira. Improving predictive inference under covariate shift by weighting the log-likelihood function. Journal of Statistical Planning and Inference, 90(2):227-244, 2000.
[3] Masashi Sugiyama, Shinichi Nakajima, Hisashi Kashima, Paul V Buenau, and Motoaki Kawan-abe. Direct importance estimation with model selection and its application to covariate shift adaptation. In Advances in Neural Information Processing Systems, pages 1433-1440, 2008.
[4] Jiayuan Huang, Arthur Gretton, Karsten M Borgwardt, Bernhard Schölkopf, and Alex J Smola. Correcting sample selection bias by unlabeled data. In Advances in Neural Information Processing Systems, pages 601-608, 2007.
[5] Corinna Cortes, Yishay Mansour, and Mehryar Mohri. Learning bounds for importance weighting. In Advances in Neural Information Processing Systems, pages 442-450, 2010.
[6] Corinna Cortes, Mehryar Mohri, and Andrés M Medina. Adaptation based on generalized discrepancy. Journal of Machine Learning Research, 20(1):1-30, 2019.
[7] ER Gizewski, L Mayer, BA Moser, DH Nguyen, S Pereverzyev Jr, SV Pereverzyev, N Shepeleva, and W Zellinger. On a regularization of unsupervised domain adaptation in RKHS. RICAM-Reports, 2021.
[8] Shai Ben-David, John Blitzer, Koby Crammer, and Fernando Pereira. Analysis of representations for domain adaptation. In Advances in Neural Information Processing Systems, pages 137-144,
[9] Hal Daumé III. Frustratingly easy domain adaptation. In Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, pages 256-263, 2007.
[10] Sinno J Pan, Ivor W Tsang, James T Kwok, and Qiang Yang. Domain adaptation via transfer component analysis. IEEE Transactions on Neural Networks, 22(2):199-210, 2011.
[11] Boqing Gong, Kristen Grauman, and Fei Sha. Connecting the dots with landmarks: Discriminatively learning domain-invariant features for unsupervised domain adaptation. In Proceedings of the International Conference on Machine Learning, pages 222-230, 2013.
[12] Eric Tzeng, Judy Hoffman, Ning Zhang, Kate Saenko, and Trevor Darrell. Deep domain confusion: Maximizing for domain invariance. arXiv preprint arXiv:1412.3474, 2014.
[13] Mingming Gong, Kun Zhang, Tongliang Liu, Dacheng Tao, Clark Glymour, and Bernhard Schölkopf. Domain adaptation with conditional transferable components. In Proceedings of the International Conference on Machine Learning, pages 2839-2848, 2016.
[14] Joaquin Quionero-Candela, Masashi Sugiyama, Anton Schwaighofer, and Neil D Lawrence. Dataset Shift in Machine Learning. The MIT Press, 2009.
[15] Mingsheng Long, Yue Cao, Jianmin Wang, and Michael Jordan. Learning transferable features with deep adaptation networks. In Proceedings of the International Conference on Machine Learning, pages 97-105, 2015.
[16] Yaroslav Ganin, Evgeniya Ustinova, Hana Ajakan, Pascal Germain, Hugo Larochelle, Francois Laviolette, Mario Marchand, and Victor Lempitsky. Domain-adversarial training of neural networks. Journal of Machine Learning Research, 17(Jan):1-35, 2016.
[17] Baochen Sun and Kate Saenko. Deep coral: Correlation alignment for deep domain adaptation. In Proceedings of the European Conference on Computer Vision, pages 443-450, 2016.
[18] Werner Zellinger, Thomas Grubinger, Edwin Lughofer, Thomas Natschläger, and Susanne Saminger-Platz. Central moment discrepancy (cmd) for domain-invariant representation learning. International Conference on Learning Representations, 2017.
[19] Fredrik D Johansson, David Sontag, and Rajesh Ranganath. Support and invertibility in domain-invariant representations. In Proceedings of the International Conference on Artificial Intelligence and Statistics, pages 527-536, 2019.
[20] Yuchen Zhang, Tianle Liu, Mingsheng Long, and Michael Jordan. Bridging theory and algorithm for domain adaptation. In Proceedings of the International Conference on Machine Learning, pages 7404-7413, 2019.
[21] Victor Bouvier, Philippe Very, Clément Chastagnol, Myriam Tami, and Céline Hudelot. Robust domain adaptation: Representations, weights and inductive bias. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, 2020.
[22] Yishay Mansour, Mehryar Mohri, and Afshin Rostamizadeh. Domain adaptation: Learning bounds and algorithms. Proceedings of the Annual Conference on Learning Theory, 2009.
[23] Kaichao You, Ximei Wang, Mingsheng Long, and Michael Jordan. Towards accurate model selection in deep unsupervised domain adaptation. In Proceedings of the International Conference on Machine Learning, pages 7124-7133, 2019.
[24] Wouter M Kouw, Jesse H Krijthe, and Marco Loog. Robust importance-weighted cross-validation under sample selection bias. In IEEE International Workshop on Machine Learning for Signal Processing, pages 1-6. IEEE, 2019.
[25] Shai Ben-David, John Blitzer, Koby Crammer, Alex Kulesza, Fernando Pereira, and Jennifer Wortman Vaughan. A theory of learning from different domains. Machine Learning, 79(1-2):151-175, 2010.
[26] OV Lepskii. On a problem of adaptive estimation in gaussian white noise. Theory of Probability & Its Applications, 35(3):454-466, 1991.
[27] Alexander Goldenshluger and Sergei V Pereverzev. Adaptive estimation of linear functionals in hilbert scales from indirect white noise observations. Probability Theory and Related Fields, 118(2):169-186, 2000.
[28] Shuai Lu, Peter Mathé, and Sergei V Pereverzev. Balancing principle in supervised learning for a general regularization scheme. Applied and Computational Harmonic Analysis, 48(1):123-148, 2020.
[29] Nicolas Courty, Rémi Flamary, Devis Tuia, and Alain Rakotomamonjy. Optimal transport for domain adaptation. IEEE Transactions on Pattern Analysis and Machine Intelligence,
[29] Nicolas Courty, Rémi Flamary, Devis Tuia, 和 Alain Rakotomamonjy. 最优传输用于领域适应. IEEE 模式分析与机器智能汇刊,
[30] Uri Shalit, Fredrik D Johansson, and David Sontag. Estimating individual treatment effect: generalization bounds and algorithms. In Proceedings of the International Conference on Machine Learning, pages 3076-3085, 2017.
[30] Uri Shalit, Fredrik D Johansson, 和 David Sontag. 估计个体治疗效果:泛化界限和算法. 在国际机器学习会议论文集中,页码 3076-3085, 2017.
[31] Mahsa Baktashmotlagh, Mehrtash T Harandi, Brian C Lovell, and Mathieu Salzmann. Unsupervised domain adaptation by domain invariant projection. In Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition, pages 769-776, 2013.
[31] Mahsa Baktashmotlagh, Mehrtash T Harandi, Brian C Lovell, 和 Mathieu Salzmann. 通过领域不变投影进行无监督领域适应. 在 IEEE 国际计算机视觉与模式识别会议论文集中,页码 769-776, 2013.
[32] Mingsheng Long, Han Zhu, Jianmin Wang, and Michael I Jordan. Unsupervised domain adaptation with residual transfer networks. In Advances in Neural Information Processing Systems, pages 136-144, 2016.
[32] Mingsheng Long, Han Zhu, Jianmin Wang, 和 Michael I Jordan. 使用残差转移网络的无监督领域适应. 在神经信息处理系统进展中,页码 136-144, 2016.
[33] Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. Gans trained by a two time-scale update rule converge to a local nash equilibrium. In Advances in Neural Information Processing Systems, pages 6629-6640, 2017.
[33] Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, 和 Sepp Hochreiter. 通过两时间尺度更新规则训练的 GAN 收敛到局部纳什均衡. 在神经信息处理系统进展中,页码 6629-6640, 2017.
[34] Werner Zellinger, Bernhard A Moser, and Susanne Saminger-Platz. On generalization in moment-based domain adaptation. Annals of Mathematics and Artificial Intelligence, 89(3):333- 369, 2021.
[34] Werner Zellinger, Bernhard A Moser, 和 Susanne Saminger-Platz. 关于基于矩的领域适应中的泛化. 数学与人工智能年刊, 89(3):333-369, 2021.
[35] Werner Zellinger, Bernhard A. Moser, Thomas Grubinger, Edwin Lughofer, Thomas Natschläger, and Susanne Saminger-Platz. Robust unsupervised domain adaptation for neural networks via moment alignment. Information Sciences, 483:174-191, May 2019.
[35] Werner Zellinger, Bernhard A. Moser, Thomas Grubinger, Edwin Lughofer, Thomas Natschläger, 和 Susanne Saminger-Platz. 通过矩量对齐实现神经网络的鲁棒无监督领域适应。信息科学, 483:174-191, 2019年5月。
[36] Werner Zellinger and Bernhard A Moser. On the truncated Hausdorff moment problem under Sobolev regularity conditions. Applied Mathematics and Computation, 400:126057, 2021.
[36] Werner Zellinger 和 Bernhard A Moser. 在 Sobolev 正则性条件下的截断 Hausdorff 矩问题。应用数学与计算, 400:126057, 2021年。
[37] Eric Tzeng, Judy Hoffman, Kate Saenko, and Trevor Darrell. Adversarial discriminative domain adaptation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 7167-7176, 2017.
[37] Eric Tzeng, Judy Hoffman, Kate Saenko, 和 Trevor Darrell. 对抗性判别领域适应。在 IEEE 计算机视觉与模式识别会议论文集中, 页7167-7176, 2017年。
[38] Krikamol Muandet, David Balduzzi, and Bernhard Schölkopf. Domain generalization via invariant feature representation. In Proceedings of the International Conference on Machine Learning, pages 10-18, 2013.
[38] Krikamol Muandet, David Balduzzi, 和 Bernhard Schölkopf. 通过不变特征表示进行领域泛化。在国际机器学习会议论文集中, 页10-18, 2013年。
[39] Fuzhen Zhuang, Xiaohu Cheng, Ping Luo, Sinno Jialin Pan, and Qing He. Supervised representation learning: Transfer learning with deep autoencoders. In Proceedings of the International Joint Conference on Artificial Intelligence, pages 4119-4125, 2015.
[39] Fuzhen Zhuang, Xiaohu Cheng, Ping Luo, Sinno Jialin Pan, 和 Qing He. 监督表示学习:使用深度自编码器的迁移学习。在国际人工智能联合会议论文集中, 页4119-4125, 2015年。
[40] Heinz W Engl, Martin Hanke, and Andreas Neubauer. Regularization of inverse problems, volume 375. Springer Science & Business Media, 1996.
[40] Heinz W Engl, Martin Hanke, 和 Andreas Neubauer. 逆问题的正则化, 第375卷。施普林格科学与商业媒体, 1996年。
[41] Shuai Lu and Sergei V Pereverzev. Regularization theory for ill-posed problems: selected topics, volume 58. Walter de Gruyter, 2013.
[41] Shuai Lu 和 Sergei V Pereverzev. 不适定问题的正则化理论:选定主题, 第58卷。Walter de Gruyter, 2013年。
[42] Ernesto De Vito, Sergei V Pereverzyev, and Lorenzo Rosasco. Adaptive kernel methods using the balancing principle. Foundations of Computational Mathematics, 10(4):455-479, 2010.
[42] Ernesto De Vito, Sergei V Pereverzyev, 和 Lorenzo Rosasco. 使用平衡原则的自适应核方法. 计算数学基础, 10(4):455-479, 2010.
[43] Ramin Nikzad-Langerodi, Werner Zellinger, Susanne Saminger-Platz, and Bernhard A Moser. Domain adaptation for regression under beer-lambert's law. Knowledge-Based Systems, 210:106447, 2020.
[43] Ramin Nikzad-Langerodi, Werner Zellinger, Susanne Saminger-Platz, 和 Bernhard A Moser. 基于比尔-朗伯定律的回归领域适应. 知识基础系统, 210:106447, 2020.
[44] Daniel Kifer, Shai Ben-David, and Johannes Gehrke. Detecting change in data streams. In Proceedings of the Thirtieth International Conference on Very Large Data Bases-Volume 30, pages 180-191. VLDB Endowment, 2004.
[44] Daniel Kifer, Shai Ben-David, 和 Johannes Gehrke. 检测数据流中的变化. 在第三十届国际大型数据库会议论文集-第30卷, 页180-191. VLDB 基金会, 2004.
[45] Andrei Nikolaevich Tikhonov. On the solution of ill-posed problems and the method of regularization. In Doklady Akademii Nauk, volume 151, pages 501-504, 1963.
[45] Andrei Nikolaevich Tikhonov. 关于病态问题的解及正则化方法. 在科学院公报, 第151卷, 页501-504, 1963.
[46] CW Groetsch. The theory of tikhonov regularization for fredholm equations. Boston Pitman Publication, 1984.
[46] CW Groetsch. Fredholm 方程的 Tikhonov 正则化理论. 波士顿 Pitman 出版, 1984.
[47] Frank Bauer, Sergei Pereverzev, and Lorenzo Rosasco. On regularization algorithms in learning theory. Journal of complexity, 23(1):52-72, 2007.
[47] Frank Bauer, Sergei Pereverzev, 和 Lorenzo Rosasco. 关于学习理论中的正则化算法. 复杂性杂志, 23(1):52-72, 2007.
[48] Andrea Caponnetto and Ernesto De Vito. Optimal rates for the regularized least-squares algorithm. Foundations of Computational Mathematics, 7(3):331–368, 2007.
[48] Andrea Caponnetto 和 Ernesto De Vito. 正则化最小二乘算法的最优速率. 计算数学基础, 7(3):331–368, 2007.
[49] Steve Smale and Ding-Xuan Zhou. Learning theory estimates via integral operators and their approximations. Constructive approximation, 26(2):153-172, 2007.
[49] Steve Smale 和 Ding-Xuan Zhou. 通过积分算子及其近似的学习理论估计. 构造近似, 26(2):153-172, 2007.
[50] Gord Sinnamon. Transferring monotonicity in weighted norm inequalities. Collectanea Mathematica, pages 181-216, 2003.
[50] Gord Sinnamon. 在加权范数不等式中转移单调性。Collectanea Mathematica, 页码 181-216, 2003.
[51] Arthur Gretton, Karsten M Borgwardt, Malte J Rasch, Bernhard Schölkopf, and Alexander Smola. A kernel two-sample test. Journal of Machine Learning Research, 13(3):723-773,
[51] Arthur Gretton, Karsten M Borgwardt, Malte J Rasch, Bernhard Schölkopf, 和 Alexander Smola. 核心两样本检验。机器学习研究杂志, 13(3):723-773,
2012.
[52] Shai Ben-David, Tyler Lu, Teresa Luu, and Dávid Pál. Impossibility theorems for domain adaptation. In Proceedings of the International Conference on Artificial Intelligence and Statistics, pages 129-136, 2010.
[52] Shai Ben-David, Tyler Lu, Teresa Luu, 和 Dávid Pál. 领域适应的不可能性定理。在国际人工智能与统计会议论文集中, 页码 129-136, 2010.
[53] John Blitzer, Ryan McDonald, and Fernando Pereira. Domain adaptation with structural correspondence learning. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, pages 120-128, 2006.
[53] John Blitzer, Ryan McDonald, 和 Fernando Pereira. 具有结构对应学习的领域适应。在自然语言处理经验方法会议论文集中, 页码 120-128, 2006.
[54] Minmin Chen, Zhixiang Xu, Kilian Weinberger, and Fei Sha. Marginalized denoising au-toencoders for domain adaptation. Proceedings of the International Conference on Machine Learning, pages 767-774, 2012.
[54] Minmin Chen, Zhixiang Xu, Kilian Weinberger, 和 Fei Sha. 用于领域适应的边际去噪自编码器。在国际机器学习会议论文集中, 页码 767-774, 2012.
[55] Christos Louizos, Kevin Swersky, Yujia Li, Max Welling, and Richard Zemel. The variational fair auto encoder. International Conference on Learning Representations, 2016.
[55] Christos Louizos, Kevin Swersky, Yujia Li, Max Welling, 和 Richard Zemel. 变分公平自编码器。学习表示国际会议, 2016.
[56] Xingchao Peng, Qinxun Bai, Xide Xia, Zijun Huang, Kate Saenko, and Bo Wang. Moment matching for multi-source domain adaptation. In Proceedings of the IEEE International Conference on Computer Vision, pages 1406-1415, 2019.
[56] Xingchao Peng, Qinxun Bai, Xide Xia, Zijun Huang, Kate Saenko, 和 Bo Wang. 多源领域适应的时刻匹配。在IEEE国际计算机视觉会议论文集中, 页码 1406-1415, 2019.
[57] Steffen Bickel, Michael Brückner, and Tobias Scheffer. Discriminative learning for differing training and test distributions. In Proceedings of the International Conference on Machine Learning, pages 81-88, 2007.
[57] Steffen Bickel, Michael Brückner 和 Tobias Scheffer. 针对不同训练和测试分布的判别学习. 载于国际机器学习会议论文集, 页码 81-88, 2007.
[58] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770-778, 2016.
[58] Kaiming He, Xiangyu Zhang, Shaoqing Ren 和 Jian Sun. 深度残差学习用于图像识别. 载于 IEEE 计算机视觉与模式识别会议论文集, 页码 770-778, 2016.
[59] Arthur Gretton, Karsten M Borgwardt, Malte Rasch, Bernhard Schölkopf, and Alex J Smola. A kernel method for the two-sample-problem. In Advances in Neural Information Processing Systems, pages 513-520, 2006.
[59] Arthur Gretton, Karsten M Borgwardt, Malte Rasch, Bernhard Schölkopf 和 Alex J Smola. 一种用于双样本问题的核方法. 载于神经信息处理系统进展, 页码 513-520, 2006.

Supplementary material for: The balancing principle for parameter choice in distance-regularized domain adaptation
补充材料: 距离正则化领域适应中的参数选择平衡原则

Werner Zellinger 1, Natalia Shepeleva 1 Marius-Constantin Dinu 2,3
Hamid Eghbal-zadeh 4,5 Duc Hoan Nguyen 6 Bernhard Nessler 2
Sergei V. Pereverzyev 6 Bernhard A. Moser 1
1 Software Competence Center Hagenberg GmbH
1 软件能力中心 Hagenberg GmbH
2 Institute for Machine Learning,Johannes Kepler University Linz 3 Dynatrace Research 4 Institute of Computational Perception,Johannes Kepler University Linz 5 LIT AI Lab,Johannes Kepler University Linz 6 Johann Radon Institute for Computational and Applied Mathematics,Austrian Academy of Sciences
2 机器学习研究所, 约翰·凯普勒大学林茨 3 Dynatrace 研究 4 计算感知研究所, 约翰·凯普勒大学林茨 5 LIT AI 实验室, 约翰·凯普勒大学林茨 6 约翰·拉东计算与应用数学研究所, 奥地利科学院
*werner.zellinger@scch.at

1 Balancing value of target error bound
1 目标误差界限的平衡值

In this work, we assume that a target error bound is given which satisfies the form
在这项工作中,我们假设给定一个目标误差界限,其形式满足
(1)εT(gαϕα)D(α)+E(α)
such that (a) αE(α) is continuous and bounded by some constant B>0 ,(b) αD(α)0 is continuous,non-increasing and non-degenerate,i.e. D(0)>0 and (c) limαD(α)D(0)< supα[0,)E(α)B . Under these assumptions,the balancing value α which achieves
使得 (a) αE(α) 是连续的,并且被某个常数 B>0 有界,(b) αD(α)0 是连续的、非递增的且非退化的,即 D(0)>0,以及 (c) limαD(α)D(0)< supα[0,)E(α)B。在这些假设下,平衡值 α 存在,它实现了
(2)D(α)D(0)=E¯(α)B
exists,where E¯(α) refers to the least non-decreasing majorant of E(α) .
,其中 E¯(α) 指的是 E(α) 的最小非递增上界。
Definition 1 (Least non-decreasing majorant [1]). The least non-decreasing majorant of E(α) is given by E¯(α):=supβ[0,α]E(β) .
定义 1(最小非递增上界 [1])。 E(α) 的最小非递增上界由 E¯(α):=supβ[0,α]E(β) 给出。
Interestingly,the terms D(α) and D(α)+E¯(α) evaluated at the balancing value α are only a constant factor away from the minimizer infα[0,)D(α)+E¯(α) .
有趣的是,在平衡值 α 处评估的 D(α)D(α)+E¯(α) 仅相差一个常数因子与最小化器 infα[0,)D(α)+E¯(α)
Lemma 0. If infα[0,)D(α)+E¯(α) is achieved,then
引理 0。如果 infα[0,)D(α)+E¯(α) 被实现,则
(3)D(α)max{D(0)B,1}infα[0,)D(α)+E¯(α)
(4)D(α)+E¯(α)2max{D(0)B,BD(0)}infα[0,)D(α)+E¯(α)
35th Conference on Neural Information Processing Systems (NeurIPS 2021).
第35届神经信息处理系统会议(NeurIPS 2021)。
Lemma 0 proves, under certain assumptions, the optimality of the target error rate for the model gαϕα . More precisely,Eq. (4) implies that the error εT(gαϕα) is only a constant factor away from the optimum infα[0,)εT(gαϕα) if,it exists, E(α) is non-decreasing and Eq. (1) holds with equality.
引理 0 在某些假设下证明了模型 gαϕα 的目标误差率的最优性。更准确地说,方程 (4) 意味着如果存在,且 E(α) 是非递增的,并且方程 (1) 成立,则误差 εT(gαϕα) 仅与最优值 infα[0,)εT(gαϕα) 相差一个常数因子。
Proof. Denote by αopt [0,) the value achieving the infimum of D(α)+E¯(α) . If αopt α , then the definition of α and assumption (b) imply
证明。用 αopt [0,) 表示实现 D(α)+E¯(α) 下确界的值。如果 αopt α,那么 α 的定义和假设 (b) 意味着
(5)E¯(α)D(0)BD(α)D(αopt )infα[0,)D(α)+E¯(α)
If αopt >α then the definition of α and Definition 1 imply
如果 αopt >α,那么 α 的定义和定义 1 意味着
(6)D(α)BD(0)E¯(α)E¯(αopt )infα[0,)D(α)+E¯(α)
Combining Eq. (5) and Eq. (6) for D(α) gives
将方程 (5) 和方程 (6) 结合起来得到 D(α)
(7)D(α)max{D(0)B,1}infα[0,)D(α)+E¯(α)
Combining Eq. (5) and Eq. (6) for E¯(α) gives
将 Eq. (5) 和 Eq. (6) 结合起来得到 E¯(α)
(8)E¯(α)max{BD(0),1}infα[0,)D(α)+E¯(α)
Summing Eq. (7) and Eq. (8) yields
将 Eq. (7) 和 Eq. (8) 相加得到
D(α)+E¯(α)max{D(0)B,1}infα[0,)D(α)+E¯(α)
+max{BD(0),1}infα[0,)D(α)+E¯(α)
2max{BD(0),D(0)B}infα[0,)D(α)+E¯(α)

2 Criterion for approximating the balancing value
2 近似平衡值的标准

Recall that we assume that the target cross-errors satisfy some concentration inequality
回想一下,我们假设目标交叉误差满足某种集中不等式
(9)|εT(f,g)ε^T(f,g)|ηt,F,δ
which holds with probability at least 1δ uniformly over all f,gF for some ηt,F,δR such that ηt,F,δ0 for t . The main criterion used to define the balancing principle is as follows.
该不等式以至少 1δ 的概率在所有 f,gF 上均匀成立,对于某些 ηt,F,δR 使得 ηt,F,δ0 对于 t 。定义平衡原则的主要标准如下。
Lemma 1. Let δ(0,1),α,β[0,) and denote by fα:=gαϕα . If 0αβα then the following holds with probability at least 1δ :
引理 1. 设 δ(0,1),α,β[0,) 并用 fα:=gαϕα 表示。如果 0αβα ,则以下不等式以至少 1δ 的概率成立:
(10)ε^T(fα,fβ)D(α)(2+2BD(0))+ηt,G,δ
Proof of Lemma 1. The following inequalities are all to be understood to hold with probability at least 1δ . For all αβα ,Eq. (9) and the triangle inequality give
引理 1 的证明。以下不等式均应理解为以至少 1δ 的概率成立。对于所有 αβα ,Eq. (9) 和三角不等式给出
ε^T(fα,fβ)εT(fα,fβ)+ηt,G,δ
εT(fα)+εT(fβ)+ηt,G,δ
Using the instantiation bound of the balancing principle in Eq. (1) further implies that
使用 Eq. (1) 中平衡原则的实例化界限进一步意味着
ε^T(fα,fβ)D(α)+E(α)+D(β)+E(β)+ηt,G,δ
Figure 1: The proof of Lemma 1 is based on the monotonicity of D(α)D(0) (green) and the monotonicity of the least non-decreasing majorant E¯(α)B (red dashed) of E(α)B (red).
图 1:引理 1 的证明基于 D(α)D(0) (绿色)的单调性和 E(α)B (红色)的最小非递减上界 E¯(α)B (红色虚线)的单调性。
Definition 1 of the least non-decreasing majorant gives
最小非递减上界的定义 1 给出
ε^T(fα,fβ)D(α)+E¯(α)+D(β)+E¯(β)+ηt,G,δ
Finally,we follow [2] and use the monotonicity of D and E¯ to obtain
最后,我们遵循 [2] 并利用 DE¯ 的单调性来获得
ε^T(fα,fβ)D(α)+BD(α)D(0)+D(β)+BD(β)D(0)+ηt,G,δ
D(α)(2+2BD(0))+ηt,G,δ
Figure 1 provides a helpful illustration for the last two steps.
图 1 为最后两个步骤提供了有益的说明。

3 Generalization guarantee for balancing principle estimate
3 平衡原则估计的泛化保证

Our main theorem is stated as follows.
我们的主要定理表述如下。
Theorem 1. Let δ(0,1) and α1,,αw[0,),α1=0 be an increasing sequence such that
定理 1. 设 δ(0,1)α1,,αw[0,),α1=0 是一个递增序列,使得
(11)D(αl)qD(αl+1)
for all l{1,,w1} and some q>1 . Then,with probability at least 1δ
对于所有 l{1,,w1} 和某些 q>1 。那么,概率至少为 1δ
(12)εT(gαBPϕαBP)D(α)(3+3BD(0))q+ηt,G,δ
The following proof of Theorem 1 follows arguments from the principle of balancing stability and approximation in the theory of regularized ill-posed inverse problems. See Theorem 1 in [3] for a similar application to the adaptive choice of parameters in kernel regression.
定理 1 的以下证明遵循平衡稳定性和正则化病态逆问题理论中的近似原则。有关在核回归中自适应选择参数的类似应用,请参见 [3] 中的定理 1。
Proof of Thm. 1. Let us denote by
定理 1 的证明。我们记作
α¯:=max{αi|E¯(αi)BD(αi)D(0),i{1,,w}}
and by fα:=gαϕα . From Eq. (10) we obtain for all j{1,,w} such that αjα¯α with probability at least 1δ
并记为 fα:=gαϕα 。从方程 (10) 中我们得到,对于所有 j{1,,w},使得 αjα¯α 的概率至少为 1δ
(13)εT(fαj,fα¯)D(αj)(2+2BD(0))+ηt,G,δ.
Note that α¯{α1,,αw} and that αBP is the maximum of all αi{α1,,αw} satisfying
注意 α¯{α1,,αw} 并且 αBP 是所有满足的 αi{α1,,αw} 的最大值
(14)εT(fαi,fαj)D(αj)(2+2BD(0))+ηt,G,δ
for all j{1,,i1} . It follows that α¯αBP . Moreover,with probability at least 1δ ,
对于所有 j{1,,i1} 。因此,得出 α¯αBP 。此外,概率至少为 1δ
εT(fαBP)εT(fαBP,fα¯)+εT(fα¯)
D(α¯)(2+2BD(0))+ηt,G,δ+εT(fα¯)
D(α¯)(2+2BD(0))+ηt,G,δ+D(α¯)+E(α¯)
D(α¯)(2+2BD(0))+ηt,G,δ+D(α¯)+BD(α¯)D(0)
=D(α¯)(3+3BD(0))+ηt,G,δ
where we used the triangle inequality and Lemma 1 to prove the first two inequalities, followed by Eq. (1) and the same monotonicity argument as used in the proof of Lemma 1, see also Figure 1
在这里,我们使用了三角不等式和引理 1 来证明前两个不等式,接着是方程 (1) 和在引理 1 的证明中使用的相同单调性论证,另见图 1。
Finally,let l be such that α¯=:αlααl+1 . Since D is non-increasing,we obtain qD(α) qD(αl+1) and,by assumption, qD(αl+1)D(αl)=D(α¯) . The final inequality is shown by recalling that α¯α .
最后,设 l 使得 α¯=:αlααl+1。由于 D 是非递增的,我们得到 qD(α) qD(αl+1),并且根据假设,qD(αl+1)D(αl)=D(α¯)。最终的不等式通过回忆 α¯α 来证明。

4 MiniDomainNet dataset
4 MiniDomainNet 数据集

The parameter selection methods for domain adaptation require to train several models with various parameters. Furthermore, evaluation of such methods include applying various domain adaptation techniques, which results in high computational demand when using large-scale datasets. In order to reduce the computational resources needed in this area, while keeping the difficulty of working with high-resolution images, and working on a problem with several domains, we fork a smaller version of the DomainNet dataset [4], which we call the MiniDomainNet dataset. MiniDomainNet makes research on the area of parameter selection for domain adaptation more accessible, by significantly reducing the computational needs, while providing a challenging, and sufficiently-large test bed for evaluating deep models.
域适应的参数选择方法需要训练多个具有不同参数的模型。此外,这些方法的评估包括应用各种域适应技术,这在使用大规模数据集时会导致高计算需求。为了减少在这一领域所需的计算资源,同时保持处理高分辨率图像的难度,并处理多个领域的问题,我们从 DomainNet 数据集中派生出一个较小的版本,称为 MiniDomainNet 数据集。MiniDomainNet 通过显著降低计算需求,使得在域适应参数选择领域的研究更加可及,同时提供了一个具有挑战性且足够大的测试平台,用于评估深度模型。
The DomainNet dataset consists of approximately 0.6 million images divided into 6 domains (Quick-draw, Real, Clipart, Sketch, Infograph, and Painting), with each domain having 345 classes. The average count of images of DomainNet in each class, and across all domains is approx. 288. We curate the MiniDomainNet dataset from the DomainNet dataset as follows. We select the top-five largest classes in the training set of DomainNet, based on the highest average image-count per class across all domains. This selection process will result in a dataset with the largest amount of training data per class, which is ideal for training deep models.
DomainNet 数据集由大约 60 万张图像组成,分为 6 个领域(快速绘制、真实、剪贴画、素描、信息图和绘画),每个领域有 345 个类别。DomainNet 中每个类别的平均图像数量在所有领域中约为 288。我们从 DomainNet 数据集中策划 MiniDomainNet 数据集,具体如下。我们根据所有领域中每个类别的平均图像数量选择训练集中前五个最大的类别。这个选择过程将导致每个类别拥有最大数量的训练数据,这对于训练深度模型是理想的。
In our experiments with MiniDomainNet, we follow a recommendation in [4], that uses a combined-source setting. To define our domain adaptation tasks, we select 5 out of the 6 domains and combine them into our combined source dataset. We use the remaining domain as our target dataset. By permuting all source combinations, we then define 6 domain adaptation tasks, which we refer to as combined-source datasets (CS, as denoted in Table 5).
在我们对 MiniDomainNet 的实验中,我们遵循了文献 [4] 中的建议,采用了组合源设置。为了定义我们的领域适应任务,我们从 6 个领域中选择 5 个,并将它们组合成我们的组合源数据集。我们使用剩余的领域作为我们的目标数据集。通过排列所有源组合,我们定义了 6 个领域适应任务,我们称之为组合源数据集(CS,如表 5 所示)。
In addition to providing the MiniDomainNet, we further address an issue regarding the currently available version of the DomainNet dataset. During our development process, we found that 10 files from the class 327 (t-shirt) in the painting domain sub-set, were missing in the file list of the training set (painting_train.txt). We provide a fix for this issue in our source code, by inserting the missing class references and their corresponding files. The fix can be found in the source code, in dataloaders/domainnet.py.
除了提供 MiniDomainNet,我们还进一步解决了当前可用版本的 DomainNet 数据集的问题。在我们的开发过程中,我们发现绘画领域子集中类别 327(T恤)的 10 个文件在训练集的文件列表中缺失(painting_train.txt)。我们在源代码中提供了对此问题的修复,通过插入缺失的类别引用及其对应的文件。修复可以在源代码的 dataloaders/domainnet.py 中找到。

5 Extended empirical evaluations
5 扩展的实证评估

In this section, we provide details of our training setup, the computational resources used to conduct the experiments, the model selection procedures, and, our evaluation results.
在本节中,我们提供了我们的训练设置、用于进行实验的计算资源、模型选择程序以及我们的评估结果的详细信息。

5.1 Details for training
5.1 训练的详细信息

Transformed Moons On the Transformed Moons dataset, we use a feed-forward network with two fully-connected layers, with 16 nodes each, followed by ReLU non-linearity. The network is optimized by Adam [5] optimizer for 250 epochs,with β1=0.9,β2=0.999 ,and the initial learning rate of 0.01 , using a MultiStep scheduler which halved the learning rate in epochs 50,100 , and 150 . To train proxy-A classifiers (required in [6]), we use 1 fully-connected layer with 16 nodes; and are trained with Adam optimizer for 200 epochs,with β1=0.9,β2=0.999 ,and an initial learning rate of 0.01,and a MultiStep scheduler halving learning rate on epochs 50,100,150 .
转换后的月球 在转换后的月球数据集上,我们使用一个前馈网络,该网络具有两个完全连接的层,每层16个节点,后接ReLU非线性。该网络通过Adam [5] 优化器优化250个周期,使用 β1=0.9,β2=0.999,初始学习率为0.01,采用MultiStep调度器,在第50、100和150个周期将学习率减半。为了训练代理-A分类器(在[6]中需要),我们使用1个完全连接的层,包含16个节点;并使用Adam优化器训练200个周期,使用 β1=0.9,β2=0.999,初始学习率为0.01,并在第50、100、150个周期将学习率减半的MultiStep调度器。
Amazon Reviews For Amazon Reviews, we follow [7] and use a feed-forward network three fully-connected layers, with 100 nodes each, and sigmoid non-linearity. The optimizer, learning rate, and scheduler are the same as in the Transformed Moons experiments (see above). We train each model for CMD and MMD experiments for 50 epochs and for DANN for 500 . To estimate the GG -divergence,we follow [7] and train a classifier for separating the source sample and the target sample. In particular we apply 2 fully-connected layers with 100 nodes each and use the Adam optimizer for 200 epochs,with β1=0.9,β2=0.999 ,the initial learning rate of 0.01,and,a MultiStep scheduler halving learning rate on epochs 50,100,150.
亚马逊评论 对于亚马逊评论,我们遵循[7]的做法,使用一个前馈网络,该网络具有三个完全连接的层,每层100个节点,并使用sigmoid非线性。优化器、学习率和调度器与转换后的月球实验相同(见上文)。我们为CMD和MMD实验训练每个模型50个周期,为DANN训练500个周期。为了估计 GG -散度,我们遵循[7],训练一个分类器以区分源样本和目标样本。具体而言,我们应用2个完全连接的层,每层100个节点,并使用Adam优化器训练200个周期,使用 β1=0.9,β2=0.999,初始学习率为0.01,并在第50、100、150个周期将学习率减半的MultiStep调度器。
MiniDomainNet Following the pre-trained setup from [4], we use a frozen ResNet-18 backbone model which was trained on ImageNet [8], and operate subsequent computations on the 512 dimensional extracted features. To alleviate overfitting effects on pre-computed features, we perform data augmentation on each batch and forward the images through the backbone each time. We incorporate zero padding before resizing the images to 256×256 to avoid image distortions. Following the guidance for data augmentation techniques from [9],we perform random resized cropping to 224×224 with a random viewport between 70% and 100% of the original image,random horizontal flipping, color jittering of 0.25% on each RGB channel,and a ±2 degree rotation. After the ResNet-18 backbone output, we add several projection layers, and define the domain adaptation layers on which we use the domain adaptation methods to align the representations. The first layers are defined as a common architecture across the different domain adaptation methods. Additional layers are further added for the classification networks, according to the requirements of the individual domain adaptation methods in CMD or MMD. The number of layers/neurons in the upper layers of our architecture have been tuned in order to achieve the best performance in the source-only setup. See Table 1 for a detailed description of the architecture used. We perform experiments on all 6 domain adaptation tasks as defined in 5.4 for each of the previously listed methods. All methods have been trained for 50 epochs with Adam optimizer,an initial learning rate of 0.001,β1=0.9,β2=0.999 , and a MultiStep learning rate scheduler, halving the learning rate after 15 and 35 epochs.
MiniDomainNet 根据 [4] 的预训练设置,我们使用一个冻结的 ResNet-18 主干模型,该模型在 ImageNet [8] 上进行了训练,并对提取的 512 维特征进行后续计算。为了减轻对预计算特征的过拟合影响,我们对每个批次进行数据增强,并在每次通过主干时转发图像。在调整图像大小到 256×256 之前,我们加入零填充,以避免图像失真。根据 [9] 的数据增强技术指导,我们对原始图像进行随机缩放裁剪到 224×224,随机视口在 70%100% 之间,随机水平翻转,对每个 RGB 通道进行 0.25% 的颜色抖动,以及进行 ±2 度的旋转。在 ResNet-18 主干输出后,我们添加了几个投影层,并定义了领域适应层,在这些层上我们使用领域适应方法来对齐表示。第一层被定义为不同领域适应方法之间的通用架构。根据 CMD 或 MMD 中各个领域适应方法的要求,进一步添加了额外的层用于分类网络。我们在架构的上层调整了层数/神经元的数量,以实现源仅设置下的最佳性能。有关所使用架构的详细描述,请参见表 1。我们对在 5.4 中定义的所有 6 个领域适应任务进行了实验,针对之前列出的每种方法。所有方法均使用 Adam 优化器训练 50 个周期,初始学习率为 0.001,β1=0.9,β2=0.999,并使用 MultiStep 学习率调度器,在第 15 和 35 个周期后将学习率减半。
To apply the balancing principle, we require the training of an additional MDD classifier, see [10]), using the features of the adaptation layer from CMD and MMD, which is further used to calculate the MDD distance. The architecture of the MDD classifier is listed in Table 2. The MDD classifiers are trained with Adam optimizer, initial learning rate of 0.0001 , and a MultiStep scheduler halving the learning rate after 15 and 25 epochs, and in total we run them for 35 epochs. For selecting the disparity parameters we followed the guidance from [11],and set γ=1.1 in the MDD training loss, and ρ=0 for calculating the MDD distance employed in BPDA.
为了应用平衡原则,我们需要训练一个额外的 MDD 分类器,参见 [10],使用来自 CMD 和 MMD 的适应层特征,这进一步用于计算 MDD 距离。MDD 分类器的架构列在表 2 中。MDD 分类器使用 Adam 优化器进行训练,初始学习率为 0.0001,并使用 MultiStep 调度器在 15 和 25 个周期后将学习率减半,总共运行 35 个周期。为了选择差异参数,我们遵循了 [11] 的指导,并在 MDD 训练损失中设置了 γ=1.1,在 BPDA 中计算 MDD 距离时设置了 ρ=0

5.2 Details for computational resources and source code
5.2 计算资源和源代码的详细信息

In experiments on Transformed Moons and Amazon Reviews, we used two HPC stations with in total 8xNVIDIA TITAN RTX 24GB, 4xIntel Xeon Scalable Processors Skylake Gold 6130 (2.10 GHz) and Ubuntu 18.04. All methods have been implemented in python using the Pytorch library [12]. We use Scikit-learn library [13] for evaluation measures and toy datasets, and the TQDM library [14], and Tensorboard [15] for keeping track of the progress of our experiments.
在对变换月亮和亚马逊评论的实验中,我们使用了两个 HPC 站,总共配备了 8 个 NVIDIA TITAN RTX 24GB、4 个 Intel Xeon 可扩展处理器 Skylake Gold 6130 (2.10 GHz) 和 Ubuntu 18.04。所有方法均在 Python 中实现,使用 Pytorch 库 [12]。我们使用 Scikit-learn 库 [13] 进行评估指标和玩具数据集,并使用 TQDM 库 [14] 和 Tensorboard [15] 来跟踪实验进展。

5.3 Details for model selection
5.3 模型选择的详细信息

Transformed Moons and Amazon Reviews IWV [16], DEV [11], and BPDA are used to choose the best parameter α{0,106,105,,103,104} ,for three different distance-regularized domain adaptation methods, namely DANN [7], MMD [17, 18] and CMD [19]. The Transformed Moons and Amazon Reviews datasets contain only 2 classes; hence, BPDA is employed using the
变换月亮和亚马逊评论 IWV [16]、DEV [11] 和 BPDA 被用来选择最佳参数 α{0,106,105,,103,104},用于三种不同的距离正则化领域适应方法,即 DANN [7]、MMD [17, 18] 和 CMD [19]。变换月亮和亚马逊评论数据集仅包含 2 个类别;因此,使用 BPDA。
Table 1: Architectural listing of all layers for training on the MiniDomainNet dataset.
表 1:在 MiniDomainNet 数据集上训练的所有层的架构列表。
Common Architecture
LayersValues
Backbone Output Layer Projection Layers Adaptation LayersResNet-18 (Adaptive Average Pooling Layer)512
Fully-connected Layer Batch Normalization 1D Layer ReLU Fully-connected Layer Batch Normalization 1D Layer ReLU Activation Dropout Layer Fully-connected Layer Batch Normalization 1D Layer ReLU Activation Dropout Layer Fully-connected Layer Batch Normalization 1D Layer ReLU Activation Dropout Layer1024 1024 0.5 1024 0.5 1024 0.5
Fully-connected Layer Batch Normalization 1D Layer ReLU Activation Dropout Layer Fully-connected Layer Batch Normalization 1D Layer ReLU Activation512 0.5 512
Class Output LayerCMD
Fully-connected Layer5
Class Output LayerMMD
Fully-connected Layer5
常见架构
主干输出层 投影层 适应层ResNet-18(自适应平均池化层)512
全连接层 批量归一化 1D层 ReLU 全连接层 批量归一化 1D层 ReLU 激活 Dropout层 全连接层 批量归一化 1D层 ReLU 激活 Dropout层 全连接层 批量归一化 1D层 ReLU 激活 Dropout层1024 1024 0.5 1024 0.5 1024 0.5
全连接层 批量归一化 1D层 ReLU 激活 Dropout层 全连接层 批量归一化 1D层 ReLU 激活512 0.5 512
类别输出层CMD
全连接层5
类别输出层MMD
全连接层5
Table 2: MDD classifier architecture for CMD and MMD.
表 2: MDD 分类器架构用于 CMD 和 MMD。
MDD Classifier
LayersValues
Backbone Output LayerCMD/MMD-Method Adaptation Layer512
Projection LayersFully-connected Layer Batch Normalization 1D Layer ReLU512
Class Output LayerFully-connected Layer5
MDD 分类器
主干输出层CMD/MMD 方法适应层512
投影层全连接层批量归一化 1D 层 ReLU512
类别输出层全连接层5
bound introduced in [6]. All the methods (DANN, MMD and CMD) are repeated 10 times for each parameter α . The if-statement in the BPDA in Algorithm 1 is considered violated,if there is a violation of the statement for at least one of the repetitions. For a fair comparison, the evaluations of IWV and DEV are also based on 10 repetitions. More precisely, for IWV and DEV, we choose the parameter with the lowest average importance weighted risk and lowest average DEV-risk, respectively, where the average is computed over all 10 repetitions.
在 [6] 中引入的界限。所有方法(DANN、MMD 和 CMD)对每个参数 α 重复 10 次。如果在算法 1 的 BPDA 中的 if 语句对于至少一个重复存在违反,则视为违反。为了公平比较,IWV 和 DEV 的评估也基于 10 次重复。更准确地说,对于 IWV 和 DEV,我们选择具有最低平均重要性加权风险和最低平均 DEV 风险的参数,其中平均值是基于所有 10 次重复计算的。

MiniDomainNet IWV,DEV,and BPDA are used to choose the best parameter α {0,103,102,101,1,10} ,for two different distance-regularized domain adaptation methods, namely MMD, and CMD. Since the MiniDomainNet dataset contains 5 classes, we use the bound proposed in [10] to instantiate the BPDA. The training procedure is the same as for Transformed Moons and Amazon Reviews.
MiniDomainNet IWV、DEV 和 BPDA 被用来选择最佳参数 α {0,103,102,101,1,10},用于两种不同的距离正则化领域适应方法,即 MMD 和 CMD。由于 MiniDomainNet 数据集包含 5 个类别,我们使用 [10] 中提出的界限来实例化 BPDA。训练过程与变换月亮和亚马逊评论相同。

Figure 2: Average and standard deviation over 10 repetitions of estimated learning errors E (in unsupervised domain adaptation unknown) and the domain distance D of the accompanying target error bound [6] for distance-regularized domain adaptation models with the Maximum Mean Discrepancy [20] (left) and the Central Moment Discrepancy (right).
图 2: 在 10 次重复中估计的学习错误的平均值和标准差 E(在无监督领域适应中未知)以及伴随的目标错误界限 [6] 的领域距离 D,适用于具有最大均值差异 [20](左)和中心矩差异(右)的距离正则化领域适应模型。

5.4 Results
5.4 结果

This section provides empirical evidence for the compliance of the empirical settings with our assumptions made in Section 4.1 of the main document, and, gives results on the three aforementioned datasets, comparing our approach with the state of the art in parameter selection for domain adaptation.
本节提供了实证证据,证明实证设置与我们在主文档第 4.1 节中所做假设的一致性,并给出了在三个上述数据集上的结果,将我们的方法与领域适应参数选择的最新技术进行了比较。
Verification of assumptions Figure 2 shows the behaviour of the curves D(α)/D(0),E(α)/B and E¯(α)/B for the two methods CMD and MMD and the Transformed Moons dataset. The following observations can be made.
假设的验证 图 2 显示了两种方法 CMD 和 MMD 以及变换月亮数据集的曲线 D(α)/D(0),E(α)/BE¯(α)/B 的行为。可以做出以下观察。
  • E is bounded,the mean curves of D(α) tend to be non-increasing and the technical assumption D(α)/D(0)<supα[0,)E(α)/B is satisfied for α103 . That is,the mean curves tend to follow the assumptions.
  • E 是有界的,D(α) 的均值曲线趋向于非递增,并且技术假设 D(α)/D(0)<supα[0,)E(α)/Bα103 是满足的。也就是说,均值曲线倾向于遵循这些假设。
  • The mean curves E¯ and E tend to be similar. That is,the risk which is described in the main document when considering label shift, does not apply.
  • 均值曲线 E¯E 趋向于相似。也就是说,在考虑标签偏移时,主要文档中描述的风险并不适用。
  • The average parameters chosen by the balancing principle (see αBP in Figure) are the D(α)/D(0) . That is,the BPDA described in Algorithm 1 tends to perform as expected.
  • 由平衡原则选择的平均参数(见图中的 αBP)是 D(α)/D(0)。也就是说,算法 1 中描述的 BPDA 倾向于按预期执行。
  • There is a small trend towards a violation of the monotonicity assumption for D in the right sub-figure for CMD. However, the BPDA is (on average) robust w.r.t. this violation as it picks nearly the optimal value. The corresponding numbers can be found in Table 2.
  • 在 CMD 的右子图中,存在轻微的单调性假设违反的趋势。然而,BPDA 对于这种违反是(平均上)稳健的,因为它选择了几乎最优的值。相应的数字可以在表 2 中找到。
Transformed Moons The results are provided in Table 3. It can be observed that BPDA achieves the lowest average classification error among all methods, over all domain adaptation techniques.
变换月亮 表 3 提供了结果。可以观察到,BPDA 在所有领域适应技术中实现了最低的平均分类错误。
Table 3: Average target classification error (and standard deviation) for different regularization parameter choices on the Transformed Moons dataset. 10 repetitions with different random initialization of model weights are used to estimate the importance weighted risk, the DEV risk and the BPDA. The BPDA is computed using the bound in [6].
表 3:在变换月亮数据集上,不同正则化参数选择的平均目标分类错误(及标准差)。使用 10 次不同随机初始化模型权重的重复来估计重要性加权风险、DEV 风险和 BPDA。BPDA 是使用 [6] 中的界限计算的。
MethodSOIWVDEVBPDATB
MMD0.205(±0.025)0.199 (±0.031)0.339(±0.065)0.157(±0.069)0.157(±0.069)
DANN0.177 (±0.032)0.177(±0.032)0.169 (±0.075)0.115 (±0.098)0.115(±0.098)
CMD0.205(±0.026)0.198 (±0.022)0.190(±0.051)0.185 (±0.039)0.181(±0.038)
Avg.0.196(±0.028)0.191 (±0.028)0.232(±0.064)0.152 (±0.069)0.151(±0.068)
方法SOIWVDEVBPDATB
MMD0.205(±0.025)0.199 (±0.031)0.339(±0.065)0.157(±0.069)0.157(±0.069)
DANN0.177 (±0.032)0.177(±0.032)0.169 (±0.075)0.115 (±0.098)0.115(±0.098)
CMD0.205(±0.026)0.198 (±0.022)0.190(±0.051)0.185 (±0.039)0.181(±0.038)
平均值0.196(±0.028)0.191 (±0.028)0.232(±0.064)0.152 (±0.069)0.151(±0.068)
Amazon Reviews Table 4 shows the results of three model selection methods IWV, DEV, and BPDA which are used to choose the best parameter α in the sequence of 0,106,105,,103,104 ,
亚马逊评论 表 4 显示了三种模型选择方法 IWV、DEV 和 BPDA 的结果,这些方法用于选择序列 0,106,105,,103,104 中最佳参数 α
for three different distance-regularized domain adaptation methods, namely DANN, MMD, and CMD. The 4 domains contained in Amazon Reviews are denoted in the tables as: books (B), DVDs (D), classification error across all tasks using the MMD method. These results are consistent across all domain adaptation techniques.
针对三种不同的距离正则化领域适应方法,即 DANN、MMD 和 CMD。亚马逊评论中的 4 个领域在表中表示为:书籍 (B)、DVD (D)、使用 MMD 方法在所有任务上的分类错误。这些结果在所有领域适应技术中是一致的。
MiniDomainNet Table 5 shows the results of the three model selection methods IWV, DEV, and BPDA. We omitted the experiments with DANN on the MiniDomainNet dataset due to our computational limits. The 6 domains in the MiniDomainNet are denoted in the tables as: Quickdraw (Q), Real (R), Sketch (S), Clipart (C), Infograph (I), and Painting (P). Since the source domain is always a combination of all the other domains except the target, we refer to the source as Combined Source (CS). As can be seen, our method achieves the lowest averaged classification error across all tasks using the CMD method. When using MMD, the three parameter selection methods perform very similar on average, with BPDA and DEV achieving the lowest average error across all tasks.
MiniDomainNet 表 5 显示了三种模型选择方法 IWV、DEV 和 BPDA 的结果。由于计算限制,我们省略了在 MiniDomainNet 数据集上使用 DANN 的实验。MiniDomainNet 中的 6 个领域在表中表示为:快速绘图 (Q)、真实 (R)、草图 (S)、剪贴画 (C)、信息图 (I) 和绘画 (P)。由于源领域始终是除目标领域之外所有其他领域的组合,我们将源领域称为组合源 (CS)。可以看出,我们的方法在使用 CMD 方法时在所有任务上实现了最低的平均分类错误。当使用 MMD 时,三种参数选择方法的平均表现非常相似,其中 BPDA 和 DEV 在所有任务上实现了最低的平均错误。

6 Discussion of risks and limitations
6 风险和局限性的讨论

A constructive discussion with anonymous reviewers resulted in the following list of risks and limitations of the proposed BPDA method.
与匿名审稿人的建设性讨论产生了以下关于所提 BPDA 方法的风险和局限性的清单。
Label shift In this work,we do not assume a unique labeling function lS=lT for source and target domain (covariate shift assumption),but lS and lT should be similar. In fact,even if the labeling functions are different lSlT ,the balancing value α can be well estimated by the balancing principle estimate αBP as proven by Lemma 1. However,the quality of the balancing value α itself can be negatively affected by excessive label shift. For example,the function E can first increase strongly and then decrease (caused by label shift) which implies an increasing difference between E¯(α) and E(α) for increasing α . In such situations,the target error of the model gαϕα can be high and consequently also the one of the model gαBPϕαBP identified by the BPDA. However,note that our experiments (see also Figure 2) indicate similar values for E and E¯ .
标签转移 在本研究中,我们不假设源域和目标域有唯一的标记函数 lS=lT(协变量转移假设),但 lSlT 应该是相似的。事实上,即使标记函数不同 lSlT,平衡值 α 也可以通过平衡原理估计 αBP 来很好地估计,正如引理 1 所证明的。然而,平衡值 α 本身的质量可能会受到过度标签转移的负面影响。例如,函数 E 可以先强烈增加然后减少(由标签转移引起),这意味着随着 α 的增加,E¯(α)E(α) 之间的差异也在增加。在这种情况下,模型 gαϕα 的目标误差可能很高,因此通过 BPDA 确定的模型 gαBPϕαBP 的目标误差也会很高。然而,请注意我们的实验(见图 2)表明 EE¯ 的值相似。
Loose instantiation bound Our model is agnostic w.r.t. the property that different target error bounds can be chosen as a basis. However,it is known in inverse problem literature [21,3] that a loose bound can lead to a low performance of the balancing principle which also holds for the BPDA. This problem can be approached by choosing target error bounds that take into account the specific domain shift situation, e.g. the bound [22] is suitable for general domain shift scenarios.
松散实例化界限 我们的模型对不同目标误差界限可以作为基础的属性是不可知的。然而,在逆问题文献中已知 [21,3],松散界限可能导致平衡原理的低性能,这对于 BPDA 也是适用的。这个问题可以通过选择考虑特定领域转移情况的目标误差界限来解决,例如界限 [22] 适用于一般领域转移场景。
Low performance of all models Situations exist which hinder distance-regularized domain adaptation methods to perform well. One such situation is excessive label shift as discussed above, see also [23] and references therein. Such scenarios can cause all models f1,,fw to be inaccurate or unstable. The BPDA will select a model fi{f1,,fn} with low target error εT(fi) compared to other models in the set {f1,,fn} . Nevertheless,in such situations, fi might have a high target error εT(fi) or it might be unstable. As a solution,distance-regularized domain adaptation methods can be applied with modifications, see e.g. [22, 24, 25, 26].
所有模型的低性能 存在一些情况妨碍距离正则化领域适应方法的良好表现。上述讨论的过度标签偏移就是其中之一,参见 [23] 及其引用文献。这些场景可能导致所有模型 f1,,fw 不准确或不稳定。BPDA 将选择一个目标误差 εT(fi) 较低的模型 fi{f1,,fn},与集合中的其他模型 {f1,,fn} 相比。然而,在这种情况下, fi 可能具有较高的目标误差 εT(fi) 或可能不稳定。作为解决方案,可以对距离正则化领域适应方法进行修改,参见例如 [22, 24, 25, 26]。
Focus on weight parameter Our theoretical guarantees and the high empirical performance come at the cost of focusing the selection process on the distance-penalizing parameter α . This is in contrast to other model selection methods, such as [16, 11, 27, 28, 29], which can select different types of parameters. This limitation can be approached by employing ideas from multipenalty regularization of inverse problems to combine the BPDA with related approaches [30]. One approach is to explore a grid of admissible values of several parameters by applying the balancing principle with respect to one of them and allowing others to take all corresponding grid values. As the result of such an application, one obtains a much reduced set of parameter combinations of interest and one can apply the balancing principle or related approaches w.r.t. the other parameters. See e.g. [31] for a recent application of this approach.
关注权重参数 我们的理论保证和高实证性能是以将选择过程集中在距离惩罚参数 α 上为代价的。这与其他模型选择方法形成对比,例如 [16, 11, 27, 28, 29],这些方法可以选择不同类型的参数。可以通过借鉴逆问题的多重惩罚正则化的思想,将 BPDA 与相关方法 [30] 结合来解决这一限制。一种方法是通过应用平衡原则来探索多个参数的可接受值的网格,并允许其他参数取所有相应的网格值。通过这种应用,可以获得一个大大减少的感兴趣参数组合集,并可以针对其他参数应用平衡原则或相关方法。参见例如 [31] 以获取该方法的最新应用。

Acknowledgments and Disclosure of Funding
致谢与资金披露

We thank Markus Holzleitner for careful proofreading. We thank the anonymous reviewers for a constructive discussion which formed the basis for the supplementary material's section about limitations, risks and future developments. The research reported in this paper has been funded by the Federal Ministry for Climate Action, Environment, Energy, Mobility, Innovation and Technology (BMK), the Federal Ministry for Digital and Economic Affairs (BMDW), and the Province of Upper Austria in the frame of the COMET-Competence Centers for Excellent Technologies Programme and the COMET Module S3AI managed by the Austrian Research Promotion Agency FFG. The LIT AI Lab is financed by the Federal State of Upper Austria. We further acknowledge the partial support of the FFG project AutoQual-I. The ELLIS Unit Linz, the LIT AI Lab and the Institute Here Technologies. We thank the projects AI-MOTION (LIT-2018-6-YOU-212), DeepToxGen (LIT- 2017-3-YOU-003), AI-SNN (LIT-2018-6-YOU-214), DeepFlood (LIT-2019-8-YOU-213), Medical Cognitive Computing Center (MC3), INCONTROL-RL (FFG-881064), PRIMAL (FFG-873979), S3AI (FFG-872172), DL for GranularFlow (FFG-871302), AIRI FG 9-N (FWF-36284, FWF-36235), ELISE (H2020-ICT-2019-3 ID: 951847), AIDD (MSCA-ITN-2020 ID: 956832). We thank Janssen Pharmaceutica (MaDeSMart, HBC.2018.2287), Audi.JKU Deep Learning Center, TGW LOGISTICS GROUP GMBH, Silicon Austria Labs (SAL), FILL Gesellschaft mbH, Anyline GmbH, Google, ZF Friedrichshafen AG, Robert Bosch GmbH, UCB Biopharma SRL, Merck Healthcare KGaA, Verbund Google Cloud Research Credits program with the award GCP19980904.
我们感谢Markus Holzleitner的仔细校对。我们感谢匿名评审者的建设性讨论,这为关于局限性、风险和未来发展的补充材料部分奠定了基础。本文报告的研究得到了气候行动、环境、能源、交通、创新和技术联邦部(BMK)、数字和经济事务联邦部(BMDW)以及上奥地利省在COMET优秀技术中心计划和由奥地利研究促进机构FFG管理的COMET模块S3AI框架内的资助。LIT AI实验室由上奥地利联邦州资助。我们还感谢FFG项目AutoQual-I的部分支持。ELLIS单位林茨、LIT AI实验室和Here Technologies研究所。我们感谢项目AI-MOTION(LIT-2018-6-YOU-212)、DeepToxGen(LIT-2017-3-YOU-003)、AI-SNN(LIT-2018-6-YOU-214)、DeepFlood(LIT-2019-8-YOU-213)、医学认知计算中心(MC3)、INCONTROL-RL(FFG-881064)、PRIMAL(FFG-873979)、S3AI(FFG-872172)、DL for GranularFlow(FFG-871302)、AIRI FG 9-N(FWF-36284,FWF-36235)、ELISE(H2020-ICT-2019-3 ID: 951847)、AIDD(MSCA-ITN-2020 ID: 956832)。我们感谢Janssen Pharmaceutica(MaDeSMart, HBC.2018.2287)、Audi.JKU深度学习中心、TGW LOGISTICS GROUP GMBH、Silicon Austria Labs (SAL)、FILL Gesellschaft mbH、Anyline GmbH、Google、ZF Friedrichshafen AG、Robert Bosch GmbH、UCB Biopharma SRL、Merck Healthcare KGaA、Verbund Google Cloud Research Credits项目的资助,奖项为GCP19980904。
2.1. THE BALANCING PRINCIPLE FOR PARAMETER CHOICE 51
2.1. 参数选择的平衡原则 51
Table 4: Average target classification error (and standard deviation) for different regularization parameter choices on the Amazon Reviews dataset. 10 repetitions with different random initialization The BPDA is computed using the bound in [6].
表 4:在亚马逊评论数据集上,不同正则化参数选择的平均目标分类误差(及标准差)。进行了 10 次不同随机初始化的重复。BPDA 使用 [6] 中的界限计算。
TaskMMD
SOIWVDEVBPDATB
BD0.225(±0.004)0.190 (±0.004)0.211(±0.005)0.190(±0.004)0.190(±0.004)
BE0.307 (±0.010)0.307(±0.010)0.211(±0.005)0.221(±0.008)0.206(±0.012)
BK0.266(±0.004)0.185(±0.009)0.266(±0.004)0.185(±0.009)0.185(±0.009)
DB0.278(±0.008)0.240(±0.007)0.268(±0.006)0.243(±0.006)0.230(±0.007)
DE0.273(±0.004)0.273(±0.007)0.249(±0.007)0.207(±0.004)0.189(±0.008)
DK0.266(±0.004)0.266(±0.004)0.197 (±0.006)0.197(±0.006)0.187(±0.007)
EB0.306(±0.003)0.306(±0.003)0.310(±0.007)0.295(±0.013)0.282(±0.014)
ED0.307(±0.007)0.285(±0.006)0.288(±0.009)0.264 (±0.0160.255(±0.020)
EK0.162(±0.004)0.145 (±0.0030.193 (±0.004)0.145(±0.003)0.145(±0.003)
KB0.337(±0.007)0.337(±0.007)0.334(±0.006)0.290(±0.010)0.261(±0.010)
KD0.293(±0.005)0.294(±0.007)0.306(±0.007)0.268 (±0.010)0.235(±0.014)
KE0.167(±0.002)0.169(±0.004)0.167(±0.002)0.169(±0.004)0.145(±0.002)
Avg.0.266(±0.005)0.249(±0.005)0.250(±0.005)0.223(±0.008)0.209(±0.009)
任务MMD
SOIWVDEVBPDATB
BD0.225(±0.004)0.190 (±0.004)0.211(±0.005)0.190(±0.004)0.190(±0.004)
BE0.307 (±0.010)0.307(±0.010)0.211(±0.005)0.221(±0.008)0.206(±0.012)
BK0.266(±0.004)0.185(±0.009)0.266(±0.004)0.185(±0.009)0.185(±0.009)
DB0.278(±0.008)0.240(±0.007)0.268(±0.006)0.243(±0.006)0.230(±0.007)
DE0.273(±0.004)0.273(±0.007)0.249(±0.007)0.207(±0.004)0.189(±0.008)
DK0.266(±0.004)0.266(±0.004)0.197 (±0.006)0.197(±0.006)0.187(±0.007)
EB0.306(±0.003)0.306(±0.003)0.310(±0.007)0.295(±0.013)0.282(±0.014)
ED0.307(±0.007)0.285(±0.006)0.288(±0.009)0.264 (±0.016)0.255(±0.020)
EK0.162(±0.004)0.145 (±0.003)0.193 (±0.004)0.145(±0.003)0.145(±0.003)
KB0.337(±0.007)0.337(±0.007)0.334(±0.006)0.290(±0.010)0.261(±0.010)
KD0.293(±0.005)0.294(±0.007)0.306(±0.007)0.268 (±0.010)0.235(±0.014)
KE0.167(±0.002)0.169(±0.004)0.167(±0.002)0.169(±0.004)0.145(±0.002)
平均值0.266(±0.005)0.249(±0.005)0.250(±0.005)0.223(±0.008)0.209(±0.009)
TaskDANN
SOIWVDEVBPDATB
BD0.228 (±0.003)0.220 (±0.011)0.509(±0.001)0.233(±0.053)0.220(±0.011)
BE0.322(±0.009)0.327(±0.007)0.498(±0.000)0.313(±0.081)0.235 (±0.017)
BK0.276(±0.003)0.296(±0.010)0.272(±0.013)0.247(±0.103)0.219(±0.023)
DB0.290(±0.006)0.290(±0.009)0.253(±0.045)0.253(±0.045)0.245(±0.008)
DE0.284(±0.004)0.274(±0.003)0.299(±0.007)0.252(±0.084)0.221(±0.013)
DK0.270(±0.004)0.300(±0.006)0.303(±0.007)0.217(±0.008)0.217(±0.008)
EB0.312(±0.005)0.310(±0.005)0.312(±0.005)0.372(±0.056)0.310(±0.005)
ED0.317(±0.007)0.313(±0.006)0.313(±0.006)0.327(±0.075)0.277(±0.031)
EK0.170(±0.004)0.170 (±0.0040.170(±0.004)0.172(±0.011)0.170(±0.004)
KB0.345(±0.006)0.337(±0.006)0.338 (±0.017)0.314(±0.053)0.314(±0.053)
KD0.313(±0.003)0.360(±0.005)0.360(±0.005)0.298(±0.053)0.296(±0.019)
KE0.174(±0.002)0.183 (±0.003)0.221(±0.004)0.194(±0.057)0.172(±0.010)
Avg.0.275(±0.005)0.282(±0.006)0.321(±0.010)0.266(±0.057)0.241(±0.017)
任务DANN
SOIWVDEVBPDATB
BD0.228 (±0.003)0.220 (±0.011)0.509(±0.001)0.233(±0.053)0.220(±0.011)
BE0.322(±0.009)0.327(±0.007)0.498(±0.000)0.313(±0.081)0.235 (±0.017)
BK0.276(±0.003)0.296(±0.010)0.272(±0.013)0.247(±0.103)0.219(±0.023)
DB0.290(±0.006)0.290(±0.009)0.253(±0.045)0.253(±0.045)0.245(±0.008)
DE0.284(±0.004)0.274(±0.003)0.299(±0.007)0.252(±0.084)0.221(±0.013)
DK0.270(±0.004)0.300(±0.006)0.303(±0.007)0.217(±0.008)0.217(±0.008)
EB0.312(±0.005)0.310(±0.005)0.312(±0.005)0.372(±0.056)0.310(±0.005)
ED0.317(±0.007)0.313(±0.006)0.313(±0.006)0.327(±0.075)0.277(±0.031)
EK0.170(±0.004)0.170 (±0.004)0.170(±0.004)0.172(±0.011)0.170(±0.004)
KB0.345(±0.006)0.337(±0.006)0.338 (±0.017)0.314(±0.053)0.314(±0.053)
KD0.313(±0.003)0.360(±0.005)0.360(±0.005)0.298(±0.053)0.296(±0.019)
KE0.174(±0.002)0.183 (±0.003)0.221(±0.004)0.194(±0.057)0.172(±0.010)
平均值0.275(±0.005)0.282(±0.006)0.321(±0.010)0.266(±0.057)0.241(±0.017)
TaskCMD
SOIWVDEVBPDATB
BD0.230(±0.011)0.193 (±0.006)0.231(±0.007)0.193 (±0.006)0.193(±0.006)
BE0.319(±0.013)0.309(±0.010)0.308(±0.009)0.218 (±0.011)0.218(±0.011)
BK0.269(±0.005)0.230(±0.007)0.269(±0.006)0.186 (±0.010)0.186(±0.010)
DB0.290(±0.015)0.258(±0.009)0.245(±0.008)0.228(±0.006)0.228(±0.006)
DE0.280(±0.009)0.267(±0.007)0.280(±0.006)0.203(±0.007)0.203 (±0.007)
DK0.264(±0.004)0.194 (±0.0060.194 (±0.006)0.194(±0.006)0.194 (±0.006)
EB0.314(±0.009)0.307(±0.006)0.302(±0.005)0.279(±0.010)0.279(±0.010)
ED0.320(±0.020)0.287(±0.006)0.287(±0.006)0.258(±0.014)0.258(±0.014)
EK0.174(±0.013)0.152(±0.005)0.169(±0.006)0.139(±0.005)0.139(±0.005)
KB0.346(±0.022)0.264(±0.007)0.331(±0.006)0.264(±0.007)0.264(±0.007)
KD0.314(±0.013)0.248(±0.006)0.248(±0.006)0.248(±0.006)0.248(±0.006)
KE0.178(±0.007)0.147 (±0.004)0.178(±0.007)0.147 (±0.004)0.147(±0.004)
Avg.0.275(±0.012)0.238(±0.007)0.254(±0.007)0.213(±0.008)0.213(±0.008)
任务命令
系统对象内部权重值开发BPDATB
BD0.230(±0.011)0.193 (±0.006)0.231(±0.007)0.193 (±0.006)0.193(±0.006)
BE0.319(±0.013)0.309(±0.010)0.308(±0.009)0.218 (±0.011)0.218(±0.011)
BK0.269(±0.005)0.230(±0.007)0.269(±0.006)0.186 (±0.010)0.186(±0.010)
DB0.290(±0.015)0.258(±0.009)0.245(±0.008)0.228(±0.006)0.228(±0.006)
DE0.280(±0.009)0.267(±0.007)0.280(±0.006)0.203(±0.007)0.203 (±0.007)
DK0.264(±0.004)0.194 (±0.006)0.194 (±0.006)0.194(±0.006)0.194 (±0.006)
EB0.314(±0.009)0.307(±0.006)0.302(±0.005)0.279(±0.010)0.279(±0.010)
ED0.320(±0.020)0.287(±0.006)0.287(±0.006)0.258(±0.014)0.258(±0.014)
EK0.174(±0.013)0.152(±0.005)0.169(±0.006)0.139(±0.005)0.139(±0.005)
KB0.346(±0.022)0.264(±0.007)0.331(±0.006)0.264(±0.007)0.264(±0.007)
KD0.314(±0.013)0.248(±0.006)0.248(±0.006)0.248(±0.006)0.248(±0.006)
KE0.178(±0.007)0.147 (±0.004)0.178(±0.007)0.147 (±0.004)0.147(±0.004)
平均值0.275(±0.012)0.238(±0.007)0.254(±0.007)0.213(±0.008)0.213(±0.008)
Table 5: Average target classification error (and standard deviation) for different regularization parameter choices on the MiniDomainNet dataset. 3 repetitions with different random initialization of model weights are used to estimate the importance weighted risk, the DEV risk and the BPDA. The BPDA is computed using the bound in [10].
表 5:在 MiniDomainNet 数据集上,不同正则化参数选择的平均目标分类误差(及标准差)。使用 3 次不同随机初始化的模型权重重复来估计重要性加权风险、DEV 风险和 BPDA。BPDA 使用 [10] 中的界限计算。
TaskMMD
SOIWVDEVBPDATB
CSQ0.568(±0.007)0.629(±0.023)0.629(±0.023)0.629(±0.023)0.568(±0.007)
CSR0.068(±0.009)0.102(±0.020)0.102(±0.020)0.098(±0.006)0.068(±0.009)
CSS0.309(±0.010)0.307(±0.001)0.324(±0.021)0.296(±0.012)0.296(±0.012)
CSC0.246(±0.016)0.264(±0.013)0.264(±0.013)0.264(±0.013)0.246(±0.016)
CSI0.605(±0.012)0.577(±0.004)0.564(±0.001)0.589(±0.021)0.564(±0.001)
CSP0.178(±0.006)0.212(±0.012)0.202(±0.011)0.214(±0.007)0.178(±0.006)
Avg.0.329(±0.010)0.349(±0.012)0.348(±0.015)0.348(±0.014)0.320(±0.009)
任务MMD
SOIWVDEVBPDATB
CSQ0.568(±0.007)0.629(±0.023)0.629(±0.023)0.629(±0.023)0.568(±0.007)
CSR0.068(±0.009)0.102(±0.020)0.102(±0.020)0.098(±0.006)0.068(±0.009)
CSS0.309(±0.010)0.307(±0.001)0.324(±0.021)0.296(±0.012)0.296(±0.012)
CSC0.246(±0.016)0.264(±0.013)0.264(±0.013)0.264(±0.013)0.246(±0.016)
CSI0.605(±0.012)0.577(±0.004)0.564(±0.001)0.589(±0.021)0.564(±0.001)
CSP0.178(±0.006)0.212(±0.012)0.202(±0.011)0.214(±0.007)0.178(±0.006)
平均0.329(±0.010)0.349(±0.012)0.348(±0.015)0.348(±0.014)0.320(±0.009)
TaskCMD
SOIWVDEVBPDATB
CSQ0.568(±0.007)0.568(±0.007)0.812(±0.000)0.410(±0.008)0.410(±0.008)
CSR0.068(±0.009)0.068(±0.0090.841(±0.000)0.100(±0.010)0.068(±0.009)
CSS0.309(±0.010)0.305(±0.012)0.875(±0.000)0.298(±0.005)0.298(±0.005)
CSC0.246(±0.016)0.257(±0.023)0.850(±0.000)0.282(±0.032)0.246(±0.016)
CSI0.605(±0.012)0.556(±0.0310.883(±0.000)0.601(±0.016)0.556(±0.031)
CSP0.178(±0.006)0.246(±0.022)0.986(±0.000)0.293(±0.019)0.178(±0.006)
Avg.0.329(±0.010)0.333(±0.017)0.875(±0.000)0.331(±0.015)0.293(±0.013)
任务CMD
SOIWVDEVBPDATB
CSQ0.568(±0.007)0.568(±0.007)0.812(±0.000)0.410(±0.008)0.410(±0.008)
CSR0.068(±0.009)0.068(±0.0090.841(±0.000)0.100(±0.010)0.068(±0.009)
CSS0.309(±0.010)0.305(±0.012)0.875(±0.000)0.298(±0.005)0.298(±0.005)
CSC0.246(±0.016)0.257(±0.023)0.850(±0.000)0.282(±0.032)0.246(±0.016)
CSI0.605(±0.012)0.556(±0.0310.883(±0.000)0.601(±0.016)0.556(±0.031)
CSP0.178(±0.006)0.246(±0.022)0.986(±0.000)0.293(±0.019)0.178(±0.006)
平均值0.329(±0.010)0.333(±0.017)0.875(±0.000)0.331(±0.015)0.293(±0.013)

References
参考文献

[1] Gord Sinnamon. Transferring monotonicity in weighted norm inequalities. Collectanea Mathematica, pages 181-216, 2003.
[1] Gord Sinnamon. 在加权范数不等式中转移单调性。数学集刊,页码 181-216,2003。
[2] OV Lepskii. On a problem of adaptive estimation in gaussian white noise. Theory of Probability & Its Applications, 35(3):454-466, 1991.
[2] OV Lepskii. 关于高斯白噪声中的自适应估计问题。概率论及其应用,35(3):454-466,1991。
[3] Ernesto De Vito, Sergei V Pereverzyev, and Lorenzo Rosasco. Adaptive kernel methods using the balancing principle. Foundations of Computational Mathematics, 10(4):455-479, 2010.
[3] Ernesto De Vito, Sergei V Pereverzyev 和 Lorenzo Rosasco. 使用平衡原则的自适应核方法。计算数学基础,10(4):455-479,2010。
[4] Xingchao Peng, Qinxun Bai, Xide Xia, Zijun Huang, Kate Saenko, and Bo Wang. Moment matching for multi-source domain adaptation. In Proceedings of the IEEE International Conference on Computer Vision, pages 1406-1415, 2019.
[4] Xingchao Peng, Qinxun Bai, Xide Xia, Zijun Huang, Kate Saenko 和 Bo Wang. 多源领域适应的矩匹配。在 IEEE 国际计算机视觉会议论文集中,页码 1406-1415,2019。
[5] Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
[5] Diederik P Kingma 和 Jimmy Ba. Adam:一种随机优化方法。arXiv 预印本 arXiv:1412.6980,2014。
[6] Shai Ben-David, John Blitzer, Koby Crammer, Alex Kulesza, Fernando Pereira, and Jennifer Wortman Vaughan. A theory of learning from different domains. Machine Learning, 79(1-2):151-175, 2010.
[6] Shai Ben-David, John Blitzer, Koby Crammer, Alex Kulesza, Fernando Pereira, 和 Jennifer Wortman Vaughan. 不同领域学习的理论. 机器学习, 79(1-2):151-175, 2010.
[7] Yaroslav Ganin, Evgeniya Ustinova, Hana Ajakan, Pascal Germain, Hugo Larochelle, Francois Laviolette, Mario Marchand, and Victor Lempitsky. Domain-adversarial training of neural networks. Journal of Machine Learning Research, 17(Jan):1-35, 2016.
[7] Yaroslav Ganin, Evgeniya Ustinova, Hana Ajakan, Pascal Germain, Hugo Larochelle, Francois Laviolette, Mario Marchand, 和 Victor Lempitsky. 神经网络的领域对抗训练. 机器学习研究杂志, 17(一月):1-35, 2016.
[8] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems, pages 1097-1105, 2012.
[8] Alex Krizhevsky, Ilya Sutskever, 和 Geoffrey E Hinton. 使用深度卷积神经网络进行Imagenet分类. 在神经信息处理系统进展中, 页码 1097-1105, 2012.
[9] Connor Shorten and Taghi M Khoshgoftaar. A survey on image data augmentation for deep learning. Journal of Big Data, 6(1):1-48, 2019.
[9] Connor Shorten 和 Taghi M Khoshgoftaar. 深度学习的图像数据增强调查. 大数据杂志, 6(1):1-48, 2019.
[10] Yuchen Zhang, Tianle Liu, Mingsheng Long, and Michael Jordan. Bridging theory and algorithm for domain adaptation. In Proceedings of the International Conference on Machine Learning, pages 7404-7413, 2019.
[10] Yuchen Zhang, Tianle Liu, Mingsheng Long, 和 Michael Jordan. 理论与算法在领域适应中的桥接. 在国际机器学习会议论文集中, 页码 7404-7413, 2019.
[11] Kaichao You, Ximei Wang, Mingsheng Long, and Michael Jordan. Towards accurate model selection in deep unsupervised domain adaptation. In Proceedings of the International Conference on Machine Learning, pages 7124-7133, 2019.
[11] Kaichao You, Ximei Wang, Mingsheng Long, 和 Michael Jordan. 朝着深度无监督领域适应中的准确模型选择. 在国际机器学习会议论文集中, 页码 7124-7133, 2019.
[12] Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et al. Pytorch: An imperative style, high-performance deep learning library. arXiv preprint arXiv:1912.01703, 2019.
[12] Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, 等人. Pytorch: 一种命令式风格的高性能深度学习库. arXiv 预印本 arXiv:1912.01703, 2019.
[13] Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, et al. Scikit-learn: Machine learning in python. the Journal of machine Learning research, 12:2825-2830,
[13] Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, 等人. Scikit-learn: Python中的机器学习. 机器学习研究杂志, 12:2825-2830,
2011.
[14] Casper O da Costa-Luis. tqdm: A fast, extensible progress meter for python and cli. Journal of Open Source Software, 4(37):1277, 2019.
[14] Casper O da Costa-Luis. tqdm: 一个快速、可扩展的Python和CLI进度计. 开源软件杂志, 4(37):1277, 2019.
[15] Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Ian Goodfellow, Andrew Harp, Geoffrey Irving, Michael Isard, Yangqing Jia, Rafal Jozefowicz, Lukasz Kaiser, Manjunath Kudlur, Josh Levenberg, Dandelion Mané, Rajat Monga, Sherry Moore, Derek Murray, Chris Olah, Mike Schuster, Jonathon Shlens, Benoit Steiner, Ilya Sutskever, Kunal Talwar, Paul Tucker, Vincent Vanhoucke, Vijay Vasudevan, Fernanda Viégas, Oriol Vinyals, Pete Warden, Martin Wattenberg, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. TensorFlow: Large-scale machine learning on heterogeneous systems, 2015. Software available from tensorflow.org.
[15] Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Ian Goodfellow, Andrew Harp, Geoffrey Irving, Michael Isard, Yangqing Jia, Rafal Jozefowicz, Lukasz Kaiser, Manjunath Kudlur, Josh Levenberg, Dandelion Mané, Rajat Monga, Sherry Moore, Derek Murray, Chris Olah, Mike Schuster, Jonathon Shlens, Benoit Steiner, Ilya Sutskever, Kunal Talwar, Paul Tucker, Vincent Vanhoucke, Vijay Vasudevan, Fernanda Viégas, Oriol Vinyals, Pete Warden, Martin Wattenberg, Martin Wicke, Yuan Yu, 和 Xiaoqiang Zheng. TensorFlow: 大规模机器学习在异构系统上的应用, 2015. 软件可从 tensorflow.org 获取。
[16] Masashi Sugiyama, Shinichi Nakajima, Hisashi Kashima, Paul V Buenau, and Motoaki Kawan-abe. Direct importance estimation with model selection and its application to covariate shift adaptation. In Advances in Neural Information Processing Systems, pages 1433-1440, 2008.
[16] Masashi Sugiyama, Shinichi Nakajima, Hisashi Kashima, Paul V Buenau, 和 Motoaki Kawan-abe. 直接重要性估计与模型选择及其在协变量转移适应中的应用. 在《神经信息处理系统进展》中,第 1433-1440 页, 2008。
[17] Arthur Gretton, Karsten M Borgwardt, Malte Rasch, Bernhard Schölkopf, and Alex J Smola. A kernel method for the two-sample-problem. In Advances in Neural Information Processing Systems, pages 513-520, 2006.
[18] Mingsheng Long, Yue Cao, Jianmin Wang, and Michael Jordan. Learning transferable features with deep adaptation networks. In Proceedings of the International Conference on Machine Learning, pages 97-105, 2015.
[19] Werner Zellinger, Thomas Grubinger, Edwin Lughofer, Thomas Natschläger, and Susanne Saminger-Platz. Central moment discrepancy (cmd) for domain-invariant representation learning. International Conference on Learning Representations, 2017.
[20] Arthur Gretton, Karsten M Borgwardt, Malte J Rasch, Bernhard Schölkopf, and Alexander Smola. A kernel two-sample test. Journal of Machine Learning Research, 13(3):723-773, 2012.
[21] Shuai Lu and Sergei V Pereverzev. Regularization theory for ill-posed problems: selected topics, volume 58. Walter de Gruyter, 2013.
[22] Remi Tachet des Combes, Han Zhao, Yu-Xiang Wang, and Geoffrey J Gordon. Domain adaptation with conditional distribution matching and generalized label shift. Advances in Neural Information Processing Systems, 33, 2020.
[23] Han Zhao, Remi Tachet Des Combes, Kun Zhang, and Geoffrey Gordon. On learning invariant Machine Learning, pages 7523-7532, 2019.
[24] Kamyar Azizzadenesheli, Anqi Liu, Fanny Yang, and Animashree Anandkumar. Regularized learning for domain adaptation under label shifts. International Conference on Learning Representations, 2019.
[25] Trung Le, Tuan Nguyen, Nhat Ho, Hung Bui, and Dinh Phung. Lamda: Label matching deep domain adaptation. In Proceedings of the International Conference on Machine Learning, pages 6043-6054, 2021.
[26] Yifan Wu, Ezra Winston, Divyansh Kaushik, and Zachary Lipton. Domain adaptation with on Machine Learning, pages 6872-6881, 2019.
[27] Wouter M Kouw, Jesse H Krijthe, and Marco Loog. Robust importance-weighted cross-validation under sample selection bias. In IEEE International Workshop on Machine Learning for Signal Processing, pages 1-6. IEEE, 2019.
[28] Erheng Zhong, Wei Fan, Qiang Yang, Olivier Verscheure, and Jiangtao Ren. Cross validation framework to choose amongst models and datasets for transfer learning. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pages 547-562. Springer, 2010.
[29] Lorenzo Bruzzone and Mattia Marconcini. Domain adaptation problems: A dasvm classification technique and a circular validation strategy. IEEE transactions on pattern analysis and machine intelligence, 32(5):770-787, 2009.
[30] Massimo Fornasier, Valeriya Naumova, and Sergei V Pereverzyev. Parameter choice strategies for multipenalty regularization. SIAM Journal on Numerical Analysis, 52(4):1770-1794, 2014.
[31] Mykola Krasnoschok, Sergei Pereverzyev, Sergii V Siryk, and Nataliya Vasylyeva. Determination of the fractional order in semilinear subdiffusion equations. Fractional Calculus and Applied Analysis, 23(3):694-722, 2020.
2.2 Addressing Parameter Choice Issues in Unsupervised Domain Adaptation by Aggregation
2.2 通过聚合解决无监督领域适应中的参数选择问题

ADDRESSING PARAMETER CHOICE ISSUES IN UNSU- PERVISED DOMAIN ADAPTATION BY AGGREGATION
通过聚合解决无监督领域适应中的参数选择问题

Marius-Constantin Dinu 1,2 Markus Holzleitner 1 Maximilian Beck 1
Duc Hoan Nguyen 5 Andrea Huber 1 Hamid Eghbal-zadeh 1 Bernhard A. Moser 4
Sergei V. Pereverzyev 5 Sepp Hochreiter 1,3 Werner Zellinger 5
1 ELLIS Unit Linz and LIT AI Lab,Institute for Machine Learning,Johannes Kepler University Linz 2 Dynatrace Research
1 ELLIS 单元林茨和 LIT AI 实验室,约翰·凯普勒大学机器学习研究所 2 Dynatrace 研究
3 Institute of Advanced Research in Artificial Intelligence
3 人工智能高级研究所
4 Software Competence Center Hagenberg
4 软件能力中心哈根贝格
5 Johann Radon Institute for Computational and Applied Mathematics,Austrian Academy of Sciences
5 约翰·拉东计算与应用数学研究所,奥地利科学院

ABSTRACT
摘要

We study the problem of choosing algorithm hyper-parameters in unsupervised a target domain, drawn from a different input distribution. We follow the strategy to compute several models using different hyper-parameters, and, to subsequently compute a linear aggregation of the models. While several heuristics exist that follow this strategy, methods are still missing that rely on thorough theories for bounding the target error. In this turn, we propose a method that extends weighted least squares to vector-valued functions, e.g., deep neural networks. We show that the target error of the proposed algorithm is asymptotically not worse than twice the error of the unknown optimal aggregation. We also perform a large electroencephalogram, body sensor signals and signals from mobile phones. Our method 1 outperforms deep embedded validation (DEV) and importance weighted validation (IWV) on all datasets, setting a new state-of-the-art performance for solving parameter choice issues in unsupervised domain adaptation with theoretical error guarantees. We further study several competitive heuristics, all outperforming IWV and DEV on at least five datasets. However, our method outperforms each heuristic on at least five of seven datasets.
我们研究在无监督目标领域中选择算法超参数的问题,该领域来自不同的输入分布。我们遵循的策略是使用不同的超参数计算多个模型,并随后计算模型的线性聚合。虽然存在几种遵循此策略的启发式方法,但仍缺乏依赖于严格理论来界定目标误差的方法。在此基础上,我们提出了一种将加权最小二乘法扩展到向量值函数(例如深度神经网络)的方法。我们表明,所提算法的目标误差渐近上不超过未知最优聚合误差的两倍。我们还进行了大量的脑电图、身体传感器信号和手机信号的实验。我们的方法 1 在所有数据集上均优于深度嵌入验证(DEV)和重要性加权验证(IWV),为无监督领域适应中的参数选择问题设定了新的最先进性能,并提供了理论误差保证。我们进一步研究了几种竞争性的启发式方法,所有这些方法在至少五个数据集上均优于 IWV 和 DEV。然而,我们的方法在至少七个数据集中的五个上优于每种启发式方法。

1 INTRODUCTION
1 引言

The goal of unsupervised domain adaptation is to learn a model on unlabeled data from a target input distribution using labeled data from a different source distribution (Pan & Yang, 2010; Ben-David et al., 2010). If this goal is achieved, medical diagnostic systems can successfully be trained on unlabeled images using labeled images with a different modality (Varsavsky et al., 2020; Zou et al., 2020); segmentation models for natural images can be learned using only labeled data from computer simulations Peng et al. (2018); natural language models can be learned from unlabeled biomedical abstracts by means of labeled data from financial journals (Blitzer et al., 2006); industrial quality inspection systems can be learned on unlabeled data from new products using data from related products (Jiao et al., 2019; Zellinger et al., 2020).
无监督领域适应的目标是利用来自不同源分布的标记数据,在目标输入分布的未标记数据上学习模型(Pan & Yang, 2010; Ben-David et al., 2010)。如果这一目标得以实现,医学诊断系统可以成功地在未标记图像上进行训练,使用不同模态的标记图像(Varsavsky et al., 2020; Zou et al., 2020);自然图像的分割模型可以仅使用来自计算机模拟的标记数据进行学习(Peng et al., 2018);自然语言模型可以通过来自金融期刊的标记数据,从未标记的生物医学摘要中学习(Blitzer et al., 2006);工业质量检测系统可以在来自新产品的未标记数据上进行学习,使用来自相关产品的数据(Jiao et al., 2019; Zellinger et al., 2020)。
However, missing target labels combined with distribution shift makes parameter choice a hard problem (Sugiyama et al., 2007; You et al., 2019; Saito et al., 2021; Zellinger et al., 2021; Musgrave et al., 2021). Often, one ends up with a sequence of models, e.g., originating from different hyper-parameter configurations (Ben-David et al., 2007; Saenko et al., 2010; Ganin et al., 2016; Long et al.,
然而,缺失的目标标签与分布转移相结合,使得参数选择成为一个困难的问题(Sugiyama et al., 2007; You et al., 2019; Saito et al., 2021; Zellinger et al., 2021; Musgrave et al., 2021)。通常,最终得到的是一系列模型,例如,源自不同的超参数配置(Ben-David et al., 2007; Saenko et al., 2010; Ganin et al., 2016; Long et al.,

1 Large scale benchmark experiments are available at https://github.com/Xpitfire/iwa; dinu@ml.jku.at, werner.zellinger@ricam.oeaw.ac.at
1 大规模基准实验可在 https://github.com/Xpitfire/iwa 获取;dinu@ml.jku.at, werner.zellinger@ricam.oeaw.ac.at

Figure 1: Unsupervised domain adaptation problem (Shimodaira, 2000; Sugiyama et al., 2007; You et al., 2019). Left: Source distribution (solid) and target distribution (dashed). Right: A sequence of different linear models (dashed) is used to find the optimal linear aggregation of the models (solid). Model selection methods (Sugiyama et al., 2007; Kouw et al., 2019; You et al., 2019; Zellinger et al., 2021) cannot outperform the best single model in the sequence, confidence values as used in Zou et al. (2018) are not available, and, approaches based on averages or tendencies of majorities of models (Saito et al., 2017) suffer from a high fraction of large-error-models in the sequence. In contrast, our approach (dotted-dashed) is nearly optimal. In addition, the model computed by our method provably approaches the optimal linear aggregation for increasing sample size. For further
图 1:无监督领域适应问题(Shimodaira, 2000; Sugiyama et al., 2007; You et al., 2019)。左侧:源分布(实线)和目标分布(虚线)。右侧:一系列不同的线性模型(虚线)用于寻找模型的最佳线性聚合(实线)。模型选择方法(Sugiyama et al., 2007; Kouw et al., 2019; You et al., 2019; Zellinger et al., 2021)无法超越序列中最佳单一模型,Zou et al.(2018)中使用的置信值不可用,并且基于模型多数的平均或趋势的方法(Saito et al., 2017)在序列中存在大量高误差模型的比例较高。相比之下,我们的方法(点划线)几乎是最优的。此外,我们的方法计算出的模型在样本量增加时可证明接近最佳线性聚合。有关进一步
2015; Zellinger et al., 2017; Peng et al., 2019). In this work, we study the problem of constructing an
2015; Zellinger et al., 2017; Peng et al., 2019)。在本研究中,我们研究构建一个
such an optimal aggregation is clearly smaller than the error of the best single model in the sequence.
这样的最佳聚合显然小于序列中最佳单一模型的误差。
Although methods with mathematical error guarantees have been proposed to select the best model in the sequence (Sugiyama et al., 2007; Kouw et al., 2019; You et al., 2019; Zellinger et al., 2021), limited by severe assumptions (cf. Wilson & Cook (2020)). Typical aggregation approaches are (a) to learn an aggregation on source data only (Nozza et al., 2016), (b) to learn an aggregation on a set of (unknown) labeled target examples (Xia et al., 2013; Dai et al., 2007; III & Marcu, 2006; Duan et al., 2012), (c) to learn an aggregation on target examples (pseudo-)labeled based on confidence measures of the given models (Zhou et al., 2021; Ahmed et al., 2022; Sun, 2012; Zou et al., 2018; Saito et al., 2017), (d) to aggregate the models based on data-structure specific transformations (Yang et al., 2012; Ha & Youn, 2021), and, (e) to use specific (possibly not available) knowledge about the given models, such as information obtained at different time-steps of its gradient-based optimization process (French et al., 2018; Laine & Aila, 2017; Tarvainen & Valpola, 2017; Athiwaratkun et al., 2019; Al-Stouhi & Reddy, 2011) or the information that the given models are trained on different (source) distributions (Hoffman et al., 2018; Rakshit et al., 2019; Xu et al., 2018; Kang et al., 2020; Zhang et al., 2015). One problem shared among all methods mentioned above is that they cannot guarantee a small error, even if the sample size grows to infinity. See Figure 1 for a simple illustrative example.
尽管已经提出了具有数学误差保证的方法来选择序列中的最佳模型(Sugiyama et al., 2007; Kouw et al., 2019; You et al., 2019; Zellinger et al., 2021),但受到严苛假设的限制(参见 Wilson & Cook (2020))。典型的聚合方法包括 (a) 仅在源数据上学习聚合(Nozza et al., 2016),(b) 在一组(未知)标记的目标示例上学习聚合(Xia et al., 2013; Dai et al., 2007; III & Marcu, 2006; Duan et al., 2012),(c) 在基于给定模型的置信度度量对目标示例进行(伪)标记的基础上学习聚合(Zhou et al., 2021; Ahmed et al., 2022; Sun, 2012; Zou et al., 2018; Saito et al., 2017),(d) 基于数据结构特定变换聚合模型(Yang et al., 2012; Ha & Youn, 2021),以及 (e) 使用关于给定模型的特定(可能不可用)知识,例如在其基于梯度的优化过程的不同时间步获得的信息(French et al., 2018; Laine & Aila, 2017; Tarvainen & Valpola, 2017; Athiwaratkun et al., 2019; Al-Stouhi & Reddy, 2011)或关于给定模型在不同(源)分布上训练的信息(Hoffman et al., 2018; Rakshit et al., 2019; Xu et al., 2018; Kang et al., 2020; Zhang et al., 2015)。上述所有方法共同面临的一个问题是,即使样本大小增长到无穷大,也无法保证小误差。有关简单示例,请参见图1。
In this work, we propose (to the best of our knowledge) the first algorithm for computing aggregations of vector-valued models for unsupervised domain adaptation with target error guarantees. We extend the importance weighted least squares algorithm (Shimodaira, 2000) and corresponding recently proposed error bounds (Gizewski et al., 2022) to linear aggregations of vector-valued models. The importance weights are the values of an estimated ratio between target and source density evaluated at the examples. Every method for density-ratio estimation can be used as a basis for our approach, e.g. Sugiyama et al. (2012); Kanamori et al. (2012) and references therein. Our error bound proves that the target error of the computed aggregation is asymptotically at most twice the target error of the optimal aggregation.
在本研究中,我们提出了(据我们所知)第一个用于计算具有目标误差保证的无监督领域适应的向量值模型聚合的算法。我们扩展了重要性加权最小二乘算法(Shimodaira, 2000)及其最近提出的误差界限(Gizewski et al., 2022),以适用于向量值模型的线性聚合。重要性权重是评估在样本处计算的目标和源密度之间的估计比率的值。任何密度比估计方法都可以作为我们方法的基础,例如 Sugiyama et al. (2012); Kanamori et al. (2012) 及其参考文献。我们的误差界限证明了计算的聚合的目标误差在渐近意义上至多是最优聚合目标误差的两倍。
In addition, we perform extensive empirical evaluations on several datasets with academic data (Transformed Moons), text data (Amazon Reviews (Blitzer et al., 2006)), images (MiniDomainNet (Peng et al., 2019; Zellinger et al., 2021)), electroencephalography signals (Sleep-EDF (Eldele et al., 2021; Goldberger et al., 2000)), body sensor signals (UCI-HAR (Anguita et al., 2013), WISDM (Kwapisz et al., 2011)), and, sensor signals from mobile phones and smart watches (HHAR (Stisen et al.,
此外,我们在多个数据集上进行了广泛的实证评估,这些数据集包括学术数据(变换月亮数据集)、文本数据(亚马逊评论(Blitzer et al., 2006))、图像(MiniDomainNet(Peng et al., 2019; Zellinger et al., 2021))、脑电图信号(Sleep-EDF(Eldele et al., 2021; Goldberger et al., 2000))、身体传感器信号(UCI-HAR(Anguita et al., 2013)、WISDM(Kwapisz et al., 2011)),以及来自手机和智能手表的传感器信号(HHAR(Stisen et al.,
2015)).
We compute aggregations of models obtained from different hyper-parameter settings of 11 domain adaptation methods (e.g., DANN (Ganin et al., 2016) and Deep-Coral Sun & Saenko (2016)). Our method sets a new state of the art for methods with theoretical error guarantees, namely importance weighted validation (IWV) (Sugiyama et al., 2007) and deep embedded validation (DEV) (Kouw et al., 2019), on all datasets. We also study (1) classical least squares aggregation on source data only, (2) majority voting on target predictions, (3) averaging over model confidences, and (4) learning based on pseudo-labels. All of these heuristics outperform IWV and DEV on at least five of seven datasets, which is a result of independent interest. In contrast, our method outperforms each heuristic on at least five of seven datasets.
我们计算了来自11种领域适应方法(例如,DANN (Ganin et al., 2016) 和 Deep-Coral Sun & Saenko (2016))不同超参数设置获得的模型的聚合。我们的方法在所有数据集上为具有理论误差保证的方法设定了新的最先进水平,即重要性加权验证(IWV)(Sugiyama et al., 2007) 和深度嵌入验证(DEV)(Kouw et al., 2019)。我们还研究了(1)仅在源数据上进行的经典最小二乘聚合,(2)在目标预测上的多数投票,(3)基于模型置信度的平均,以及(4)基于伪标签的学习。所有这些启发式方法在至少七个数据集中的五个上都优于IWV和DEV,这本身就是一个独立的研究兴趣。相比之下,我们的方法在至少七个数据集中的五个上优于每个启发式方法。
Our main contributions are summarized as follows:
我们的主要贡献总结如下:
  • We propose the (to the best of our knowledge) first algorithm for ensemble learning of vector-valued models in (single-source) unsupervised domain adaptation that satisfies a non-trivial target error bound.
  • 我们提出了(据我们所知)第一个用于(单源)无监督领域适应的向量值模型集成学习算法,该算法满足非平凡的目标误差界限。
  • We prove that the target error of our algorithm is asymptotically (for increasing sample sizes) at most twice the target error of the unknown optimal aggregation.
  • 我们证明了我们的算法的目标误差在渐近上(对于增加的样本大小)至多是未知最优聚合目标误差的两倍。
  • We outperform IWV and DEV, and therefore set a new state-of-the-art performance for re-solving parameter choice issues under theoretical target error guarantees.
  • 我们的表现优于IWV和DEV,因此在理论目标误差保证下为重新解决参数选择问题设定了新的最先进性能。
  • We describe four heuristic baselines which all outperform IWV and DEV on at least five of seven datasets. Our method outperforms each heuristic on at least five of seven datasets.
  • 我们描述了四个启发式基线,它们在至少七个数据集中的五个上都优于IWV和DEV。我们的方法在至少七个数据集中的五个上优于每个启发式方法。
  • Our method tends to be more stable than others w.r.t. adding inaccurate models to the given sequence of models.
  • 我们的方法在将不准确模型添加到给定模型序列时往往比其他方法更稳定。

2 RELATED WORK
2 相关工作

It is well known that aggregations of models in an ensemble often outperform individual models (Dong of aggregation are Boosting (Schapire, 1990; Breiman, 1998), Bootstrap Aggregating (bagging) (Breiman, 1994; 1996a) and Stacking (Wolpert, 1992; Breiman, 1996b). For example, averages of multiple models pre-trained on data from a distribution different from the target one have recently been shown to achieve state-of-the-art performance on ImageNet (Wortsman et al., 2022) and their good generalization properties can be related to flat minima (Hochreiter & Schmidhuber, 1994; 1997). However, most such methods don't take into account a present distribution shift.
众所周知,集成中的模型聚合通常优于单个模型(聚合的例子有提升(Boosting)(Schapire, 1990; Breiman, 1998)、自助聚合(Bootstrap Aggregating,bagging)(Breiman, 1994; 1996a)和堆叠(Stacking)(Wolpert, 1992; Breiman, 1996b)。例如,最近的研究表明,在与目标分布不同的数据上预训练的多个模型的平均值在 ImageNet 上达到了最先进的性能(Wortsman et al., 2022),而它们良好的泛化特性可以与平坦极小值相关联(Hochreiter & Schmidhuber, 1994; 1997)。然而,大多数此类方法并未考虑当前的分布变化。
Although some ensemble learning methods exist, which take into account a present distribution shift, in contrast to our work, they are either relying on labeled target data (Nozza et al., 2016; Xia et al., 2013; III & Marcu, 2006; Dai et al., 2007; Mayr et al., 2016), are restricted by fixing the aggregation weights to be the same (Razar & Samothrakis, 2019), make assumptions on the models in the sequence or the corresponding process for learning the models (Yang et al., 2012; Ha & Youn, 2021; French et al., 2018; Laine & Aila, 2017; Tarvainen & Valpola, 2017; Athiwaratkun et al., 2019; Al-Stouhi & Reddy, 2011; Hoffman et al., 2018; Rakshit et al., 2019; Xu et al., 2018; Kang et al., 2020; Zhang et al., 2015), or, learn an aggregation based on the heuristic approach of (pseudo-)labeling some target data based on confidence measures of models in the sequence (Zhou et al., 2021; Ahmed et al., 2022; Sun, 2012; Zou et al., 2018; Saito et al., 2017). Another crucial difference of all methods above is that none of these methods can guarantee a small target error in the general setting (distribution shift, vector valued models, different classes, single source domain) described above, even if the sample size grows to infinity.
尽管存在一些考虑当前分布变化的集成学习方法,但与我们的工作相比,它们要么依赖于标记的目标数据(Nozza et al., 2016; Xia et al., 2013; III & Marcu, 2006; Dai et al., 2007; Mayr et al., 2016),要么限制聚合权重相同(Razar & Samothrakis, 2019),对序列中的模型或学习模型的相应过程做出假设(Yang et al., 2012; Ha & Youn, 2021; French et al., 2018; Laine & Aila, 2017; Tarvainen & Valpola, 2017; Athiwaratkun et al., 2019; Al-Stouhi & Reddy, 2011; Hoffman et al., 2018; Rakshit et al., 2019; Xu et al., 2018; Kang et al., 2020; Zhang et al., 2015),或者基于序列中模型的置信度度量,学习基于(伪)标记某些目标数据的启发式方法的聚合(Zhou et al., 2021; Ahmed et al., 2022; Sun, 2012; Zou et al., 2018; Saito et al., 2017)。上述所有方法的另一个关键区别在于,即使样本量增长到无穷大,这些方法也无法保证在上述一般设置(分布变化、向量值模型、不同类别、单一源领域)下的小目标误差。
Another branch of research are methods which aim at selecting the best model in the sequence. Although, such methods with error bounds have been proposed for the general setting above (Sugiyama et al., 2007; You et al., 2019; Zellinger et al., 2021), they cannot overcome a limited performance of the best model in the given sequence (cf. Figure 1 and Section 6 in the Supplementary Material of Zellinger et al. (2021)). In contrast, our method can outperform the best model in the sequence, and our empirical evaluations show that this is indeed the case in practical examples. A recent kernel-based algorithm for univariate regression, that is similar to ours, can be found in Gizewski et al. (2022). However, in contrast to Gizewski et al. (2022), our method allows a much more general form of vector-valued models which are not necessarily obtained from regularized kernel least squares, and, can therefore be applied to practical deep learning tasks.
另一研究分支是旨在选择序列中最佳模型的方法。尽管已经为上述一般设置提出了具有误差界限的方法(Sugiyama et al., 2007; You et al., 2019; Zellinger et al., 2021),但它们无法克服给定序列中最佳模型的有限性能(参见 Zellinger et al. (2021) 的图 1 和补充材料第 6 节)。相反,我们的方法能够超越序列中的最佳模型,我们的实证评估表明,在实际例子中确实如此。最近的一种基于核的单变量回归算法,类似于我们的算法,可以在 Gizewski et al. (2022) 中找到。然而,与 Gizewski et al. (2022) 相比,我们的方法允许一种更一般形式的向量值模型,这些模型不一定来自正则化的核最小二乘,因此可以应用于实际的深度学习任务。
Our work employs technical tools developed in Caponnetto & De Vito (2007; 2005). In fact, we extend Caponnetto & De Vito (2007; 2005) to deal with importance weighted least squares. Finally, it is important to note Huang et al. (2006), where a core Lemma of our proofs is proposed.
我们的工作采用了 Caponnetto & De Vito (2007; 2005) 中开发的技术工具。事实上,我们扩展了 Caponnetto & De Vito (2007; 2005) 以处理重要性加权最小二乘。最后,值得注意的是 Huang et al. (2006),其中提出了我们证明的核心引理。

3 AGGREGATION BY IMPORTANCE WEIGHTED LEAST SQUARES
3 通过重要性加权最小二乘进行聚合

This section gives a summary of the main problem of this paper and our approach. For detailed assumptions and proofs, we refer to Section A of the Supplementary Material.
本节总结了本文的主要问题和我们的方法。有关详细假设和证明,请参见补充材料的 A 节。
Notation and Setup Let XRd1 be a compact input space and YRd2 be a compact label space with inner product .,.V ,such that for the associated norm yVy0 holds for all yY and some y0>0 . Following Ben-David et al. (2010),we consider two datasets: A source dataset (x,y)=((x1,y1),,(xn,yn))(X×Y)n independently drawn according to some source distribution (probability measure) p on X×Y and an unlabeled target dataset x=(x1,,xm) Xm with elements independently drawn according to the marginal distribution 2qX of some target further denote by Rq(f)=X×Vf(x)yY2dq(x,y) the expected target risk of a vector valued function f:XY w.r.t. the least squares loss.
记号和设置 设 XRd1 为一个紧凑的输入空间, YRd2 为一个紧凑的标签空间,具有内积 .,.V,使得对于关联的范数 yVy0,对所有 yY 和某些 y0>0 都成立。根据 Ben-David 等人(2010)的研究,我们考虑两个数据集:一个源数据集 (x,y)=((x1,y1),,(xn,yn))(X×Y)n,根据某个源分布(概率测度) pX×Y 上独立抽取,和一个未标记的目标数据集 x=(x1,,xm) Xm,其元素根据某个目标的边际分布 2qX 独立抽取,进一步用 Rq(f)=X×Vf(x)yY2dq(x,y) 表示向量值函数 f:XY 关于最小二乘损失的期望目标风险。
Problem Given a set f1,,fl:XY of models,the labeled source sample(x,y)and the unlabeled target sample x ,the problem considered in this work is to find a model f:XY with a minimal target error Rq(f) .
问题 给定一组模型 f1,,fl:XY、标记的源样本 (x,y) 和未标记的目标样本 x,本研究考虑的问题是找到一个具有最小目标误差 f:XY 的模型。
Main Assumptions We rely (a) on the covariate shift assumption that the source conditional distribution p(yx) equals the target conditional distribution q(yx) ,and,(b) on the bounded density ratio assumption that there is a function β:X[0,B] with B>0 such that dqX(x)=β(x)dpX(x) .
主要假设 我们依赖于 (a) 协变量转移假设,即源条件分布 p(yx) 等于目标条件分布 q(yx),以及 (b) 有界密度比假设,即存在一个函数 β:X[0,B],使得 B>0,并且 dqX(x)=β(x)dpX(x)
Approach Our goal is to compute the linear aggregation f=i=1lcifi for c1,,clR with minimal squared target risk Rq(i=1lcifi) . Our approach relies on the fact that
方法 我们的目标是计算线性聚合 f=i=1lcifi,使得 c1,,clR 的平方目标风险最小 Rq(i=1lcifi)。我们的方法依赖于以下事实:
(1)argminc1,,clRRq(i=1lcifi)=argminc1,,clRXi=1lcifi(x)fq(x)Y2dqX(x)
for the regression functions given by fq(x)=Vydq(yx)3 ,see e.g. Cucker &Smale (2002, Proposition 1). Unfortunately,the right hand side of Eq. (1) contains information about labels fq(x) which are not given in our setting of unsupervised domain adaptation. However, borrowing an idea from importance sampling, it is possible to estimate Eq. (1). More precisely, from the covariate shift assumption we get fp(x)=Yydp(yx)=fq(x) and we can use the bounded density ratio β to obtain
对于由 fq(x)=Vydq(yx)3 给出的回归函数,参见 Cucker & Smale (2002, 命题 1)。不幸的是,方程 (1) 的右侧包含关于标签 fq(x) 的信息,而这些信息在我们的无监督领域适应设置中并未给出。然而,借用重要性抽样的一个想法,可以估计方程 (1)。更准确地说,从协变量转移假设中我们得到 fp(x)=Yydp(yx)=fq(x),并且我们可以使用有界密度比 β 来获得
(2)argminc1,,clRRq(i=1lcifi)=argminc1,,clRXβ(x)i=1lcifi(x)fp(x)Y2dpX(x)
which extends importance weighted least squares (Shimodaira, 2000; Kanamori et al., 2009) to linear aggregations i=1lcifi of vector-valued functions f1,,fl . The unique minimizer of Eq. (2) can be approximated based on available data analogously to classical least squares estimation as detailed in Algorithm 1. In the following, we call Algorithm 1 Importance Weighted Least Squares Linear Aggregation (IWA).
这扩展了重要性加权最小二乘法 (Shimodaira, 2000; Kanamori et al., 2009) 到向量值函数 f1,,fl 的线性聚合 i=1lcifi。方程 (2) 的唯一最小化器可以基于可用数据类似于经典最小二乘估计进行近似,如算法 1 中详细说明的那样。在下面,我们称算法 1 为重要性加权最小二乘线性聚合 (IWA)。

2 The existence of the conditional probability density q(yx) with q(x,y)=q(yx)qX(x) is guaranteed by
2 条件概率密度 q(yx) 的存在性与 q(x,y)=q(yx)qX(x) 是有保证的
3Y -valued integrals are defined in the sense of Lebesgue-Bochner.
3Y -值积分是以勒贝格-博赫纳的意义定义的。

Relation to Model Selection The optimal aggregation f:=argminc1,,clRRq(i=1lcifi) defined in Eq. (2) is clearly better than any single model selection since
与模型选择的关系 在方程 (2) 中定义的最优聚合 f:=argminc1,,clRRq(i=1lcifi) 显然优于任何单一模型选择,因为
(3)Rq(f)=minc1,,clRRq(i=1lcifi)minc1,,cl{0,1}Rq(i=1lcifi)minf1,,flRq(fi).
However,the optimal aggregation f cannot be computed based on finite datasets and the next logical questions are about the accuracy of the approximation f~ in Algorithm 1.
然而,最优聚合 f 不能基于有限数据集进行计算,接下来的逻辑问题是关于算法 1 中近似的准确性 f~
Algorithm 1: Importance Weighted Least Squares Linear Aggregation (IWA)
算法 1:重要性加权最小二乘线性聚合 (IWA)
Input : Set f1,,fl:XY of models,labeled source sample(x,y)and unlabeled target sample x .
输入:模型的集合 f1,,fl:XY,标记的源样本 (x,y) 和未标记的目标样本 x
Output : Linear aggregation f~=i=1lc~ifi with weights c~=(c~1,,c~l)Rl .
输出:具有权重 c~=(c~1,,c~l)Rl 的线性聚合 f~=i=1lc~ifi
Step 1 Use unlabeled samples x and x to approximate density ratio dqXdpX by some function β:X[0,B] using a classical algorithm,e.g. Sugiyama et al. (2012).
第一步 使用未标记样本 xx 通过某个函数 β:X[0,B] 使用经典算法(例如,Sugiyama 等人,2012)来近似密度比 dqXdpX
Step 2 Compute weight vector c~=G~1g~ with empirical Gram matrix G~ and vector g~ defined by
第二步 使用经验 Gram 矩阵 G~ 和向量 g~ 来计算权重向量 c~=G~1g~,定义为
G~=(1mk=1mfi(xk),fj(xk)Y)i,j=1lg~=(1nk=1nβ(xk)yk,fi(xk)Y)i=1l.
: Linear aggregation f~=i=1lc~ifi .
: 线性聚合 f~=i=1lc~ifi

4 TARGET ERROR BOUND FOR ALGORITHM 1
4 算法 1 的目标误差界限

Let us start by introducing some further notation: L2(p) refers to the Lebesgue-Bochner space of functions from X to Y ,associated to a measure p on X with corresponding inner product ,L2(p) (this space basically consists of all Y -valued functions whose Y -norms are square integrable with respect to the given measure p ). Moreover,let us introduce the (positive semi-definite) Gram matrix G=(fi,fjL2(qX))i,j=1l and the vector g¯=(βfp,fiL2(pX))i=1l . We can assume that G is invertible (and thus positive definite), since otherwise some models are too similar to others and can c=(c1,,cl)=G1g¯ ,see Lemma 4 .
让我们开始引入一些进一步的符号:L2(p) 指的是从 XY 的 Lebesgue-Bochner 函数空间,关联到 p 上的测度,具有相应的内积 ,L2(p)(该空间基本上由所有 Y 值的函数组成,这些函数的 Y 范数相对于给定的测度 p 是平方可积的)。此外,让我们引入(正半定)Gram 矩阵 G=(fi,fjL2(qX))i,j=1l 和向量 g¯=(βfp,fiL2(pX))i=1l。我们可以假设 G 是可逆的(因此是正定的),因为否则某些模型与其他模型过于相似,可以 c=(c1,,cl)=G1g¯,见引理 4。
However,neither G nor the vector g¯ is accessible in practice,because there is no access to the target measure qX . Driven by the law of large numbers we try to approximate them by averages over our given data and therefore arrive at the formulas for G~ and g~ given in Algorithm 1. This leads to the approximation f~ . Up to this point,we were only considering an intuitive perspective on the problem setting,therefore,we will now formally discuss statements on the distance between the model f~ and the optimal linear model f=i=1lcifi ,measured in terms of target risks,and how this distance behaves with increasing sample sizes. This is what we attempt with our main result:
然而,实际上既没有 G 也没有向量 g¯ 是可获取的,因为无法访问目标度量 qX。在大数法则的驱动下,我们试图通过对给定数据的平均值来近似它们,因此得出了算法 1 中给出的 G~g~ 的公式。这导致了近似 f~。到目前为止,我们只考虑了问题设置的直观视角,因此,我们现在将正式讨论模型 f~ 与最佳线性模型 f=i=1lcifi 之间的距离的陈述,该距离是通过目标风险来衡量的,以及随着样本量的增加,这个距离的变化情况。这是我们主要结果所尝试的内容:
Theorem 1. With probability 1δ it holds that
定理 1. 以概率 1δ 成立:
(4)Rq(f~)Rq(fq)2(Rq(f)Rq(fq))+C(log1δ)(n1+m1)
for some coefficient C>0 not depending on m,n and δ ,and sufficiently large m and n .
对于某个不依赖于 m,nδ 的系数 C>0,以及足够大的 mn
Before we give an outline of the proof (see Section A), let us briefly comment on the main message of Algorithm 1. Observe,that (Cucker &Smale,2002,Proposition 1) Rq(f)Rq(fa)= ffqL2(qX)2 can be interpreted as the total target error made by Algorithm 1,sometimes called function and the excess risk equals the target error of Ben-David et al. (2010). Eq. (4) compares this error for the aggregation f~ ,computed by Algorithm 1,to the error for the optimal aggregation f . Note that the error of the optimal aggregation f is unavoidable in the sense that it is determined by the decision of searching for linear aggregations of f1,,fl only. However,if the models f1,,fl are sufficiently different, then this error can be expected to be small. Theorem 1 tells us that the error of f~ approaches the one of f with increasing target and source sample size. The rate of convergence is at least linear. Finally, we emphasize that Theorem 1 does not take into account the error of the density-ratio estimation. We refer to the recent work Gizewski et al. (2022), who, for the first time, included such error in the analysis of importance weighted least squares.
在我们给出证明的概要之前(见 A 节),让我们简要评论一下算法 1 的主要信息。请注意,(Cucker & Smale, 2002, 命题 1) Rq(f)Rq(fa)= ffqL2(qX)2 可以被解释为算法 1 所产生的总目标误差,有时称为函数,而超额风险等于 Ben-David 等人(2010)的目标误差。公式 (4) 比较了算法 1 计算的聚合误差 f~ 与最优聚合的误差 f。请注意,最优聚合的误差 f 是不可避免的,因为它仅由寻找 f1,,fl 的线性聚合的决策所决定。然而,如果模型 f1,,fl 之间的差异足够大,则可以预期该误差会很小。定理 1 告诉我们,随着目标和源样本大小的增加,误差 f~ 接近于 f 的误差。收敛速率至少是线性的。最后,我们强调定理 1 并未考虑密度比估计的误差。我们提到最近的工作 Gizewski 等人(2022),他们首次在重要性加权最小二乘的分析中包含了这样的误差。
Let us now give a brief outline for the proof of Theorem 1. One key part concerns the existence of a Hilbert space H with associated inner product .,H (a reproducing kernel space of functions from XY ) which contains all given models f1,,fl and the regression function fq=fp . The space H can be constructed from any given models that are bounded and continuous functions. Furthermore, Algorithm 1 does not need any knowledge of H ,which is a modeling assumption only needed for the proofs, so that we can apply many arguments developed in Caponnetto & De Vito (2007; 2005). H is also not necessarily generated by a prescribed kernel such as Gaussian or linear kernel, and, no further smoothness assumption is required, see Sections A and B in the Supplementary Material.
现在让我们简要概述定理 1 的证明。一个关键部分涉及到存在一个希尔伯特空间 H,其相关的内积 .,H(来自 XY 的函数的再生核空间),该空间包含所有给定的模型 f1,,fl 和回归函数 fq=fp。空间 H 可以从任何给定的有界且连续的函数模型构建。此外,算法 1 不需要对 H 有任何了解,这只是一个仅在证明中需要的建模假设,因此我们可以应用 Caponnetto 和 De Vito(2007; 2005)中发展出的许多论点。H 也不一定由规定的核生成,例如高斯核或线性核,并且不需要进一步的光滑性假设,详见补充材料中的 A 和 B 节。
Moreover,in this setting one can express the excess risk as follows: Rq(f)Rq(fq)= A(ffq)H2 for some bounded linear operator A:HH . This also allows us to formulate the entries of G and g¯ in terms of the inner product ,H instead. Using properties related to the operators that appear in the construction of H ,in combination with Hoeffding-like concentration bounds in Hilbert spaces and bounds that measure, e.g., the deviation between empirical averages in source and target domain (as done in Gretton et al. (2006, Lemma 4)), we can quantify differences between the entries of G and G~ (and g¯ and g~ respectively) in terms of n,m and δ . This leads to Eq. (4).
此外,在这种情况下,可以将超额风险表示为: Rq(f)Rq(fq)= A(ffq)H2 对于某些有界线性算子 A:HH。这也使我们能够将 Gg¯ 的条目以内积 ,H 的形式进行表述。利用与 H 的构造中出现的算子相关的性质,结合希尔伯特空间中的 Hoeffding 类浓缩界限以及测量源域和目标域中经验平均值之间偏差的界限(如 Gretton 等人(2006,引理 4)所做),我们可以量化 GG~(以及 g¯g~)的条目之间的差异,以 n,mδ 为依据。这导致了方程 (4)。

5 EMPIRICAL EVALUATIONS
5 实证评估

We now empirically evaluate the performance of our approach compared to classical ensemble learning baselines and state-of-the-art model selection methods. Therefore, we structure our empirical evaluation as follows. First, we outline our experimental setup for unsupervised domain adaptation and introduce all domain adaptation methods for our analysis. Second, we describe the ensemble learning and model selection baselines, and third, we present the datasets used for our experiments. We then conclude with our results and a detailed discussion thereof.
我们现在对我们的方法的性能进行实证评估,与经典的集成学习基线和最先进的模型选择方法进行比较。因此,我们将我们的实证评估结构化如下。首先,我们概述无监督领域适应的实验设置,并介绍我们分析的所有领域适应方法。其次,我们描述集成学习和模型选择基线,第三,我们展示用于实验的数据集。最后,我们总结我们的结果并进行详细讨论。
Figure 2: Top: Mean classification accuracy (y-axis) of our method (IWA), source-only regression (SOR), deep embedded validation (DEV) and individual models (green: source accuracy, orange: target accuracy) used in the aggregation for the HHAR dataset (Stisen et al., 2015) over 3 seeds. The individual models (x-axis) are trained with DIRT (Shu et al., 2018) for different hyper-parameter choices. Bottom: Scaled Aggregation weights (y-axis) for individual models (x-axis) computed by IWA, SOR and DEV (average over 3 seeds). Instead of searching for the best model in the sequence, IWA effectively uses all models in the sequence and obtains a performance not reachable by any procedure selecting only one model.
图2:顶部:我们的方法(IWA)、仅源回归(SOR)、深度嵌入验证(DEV)和用于HHAR数据集(Stisen等,2015)的聚合中的单个模型(绿色:源准确率,橙色:目标准确率)的平均分类准确率(y轴),基于3个种子的结果。单个模型(x轴)使用不同超参数选择的DIRT(Shu等,2018)进行训练。底部:由IWA、SOR和DEV计算的单个模型(x轴)的缩放聚合权重(y轴)(3个种子的平均值)。IWA有效地利用序列中的所有模型,而不是仅仅搜索序列中的最佳模型,从而获得任何仅选择一个模型的程序无法达到的性能。

5.1 EXPERIMENTAL SETUP
5.1 实验设置

To assess the performance of our ensemble learning Algorithm 1 IWA, we perform numerous experiments with different domain adaptation algorithms on different datasets. By changing the hyper-parameters of each algorithm, we obtain, as results of applying these algorithms, sequences of models. The goal of our method is to find optimal models based on combinations of candidates from each sequence. As domain adaptation algorithms, we consider the AdaTime benchmark suite, and run our experiments on language, image, text and time-series data. This suite comprises a collection of 11 domain adaptation algorithms. We follow their evaluation setup and apply the following algorithms: Adversarial Spectral Kernel Matching (AdvSKM) (Liu & Xue, 2021), Deep Domain Confusion (DDC) (Tzeng et al., 2014), Correlation Alignment via Deep Neural Networks (Deep-Coral) (Sun et al., 2017), Central Moment Discrepancy (CMD) (Zellinger et al., 2017), Higher-order Moment Matching (HoMM) (Chen et al., 2020), Minimum Discrepancy Estimation for Deep Domain Adaptation (MMDA) (Rahman et al., 2020), Deep Subdomain Adaptation (DSAN) (Zhu et al., 2021), Domain-Adversarial Neural Networks (DANN) (Ganin et al., 2016), Conditional Adversarial Domain Adaptation (CDAN) (Long et al., 2018), A DIRT-T Approach to Unsupervised Domain Adaptation (DIRT) (Shu et al., 2018) and Convolutional deep Domain Adaptation model for Time-Series data (CoDATS) (Wilson et al., 2020). In addition to the sequence of models, IWA requires an estimate of the density ratio between source and target domain. To compute this quantity we follow (Bickel et al., 2007) and (You et al., 2019, Section 4.3), and, train a classifier discriminating between source and target data. The output of this classifier is then used to approximate the density ratio denoted as β in Algorithm 1. Overall, to compute the results in our tables we trained 16680 models over approximately a timeframe of 1500 GPU/hours using computation resources of NVIDIA P100 16GB GPUs.
为了评估我们的集成学习算法 IWA 的性能,我们在不同数据集上使用不同的领域适应算法进行了大量实验。通过改变每个算法的超参数,我们获得了应用这些算法的模型序列。我们方法的目标是基于每个序列中候选模型的组合找到最佳模型。作为领域适应算法,我们考虑了 AdaTime 基准套件,并在语言、图像、文本和时间序列数据上运行我们的实验。该套件包含 11 种领域适应算法的集合。我们遵循他们的评估设置,并应用以下算法:对抗谱核匹配 (AdvSKM) (Liu & Xue, 2021)、深度领域混淆 (DDC) (Tzeng et al., 2014)、通过深度神经网络的相关性对齐 (Deep-Coral) (Sun et al., 2017)、中心矩差异 (CMD) (Zellinger et al., 2017)、高阶矩匹配 (HoMM) (Chen et al., 2020)、深度领域适应的最小差异估计 (MMDA) (Rahman et al., 2020)、深度子领域适应 (DSAN) (Zhu et al., 2021)、领域对抗神经网络 (DANN) (Ganin et al., 2016)、条件对抗领域适应 (CDAN) (Long et al., 2018)、无监督领域适应的 DIRT-T 方法 (DIRT) (Shu et al., 2018) 和时间序列数据的卷积深度领域适应模型 (CoDATS) (Wilson et al., 2020)。除了模型序列,IWA 还需要估计源域和目标域之间的密度比。为了计算这个量,我们遵循 (Bickel et al., 2007) 和 (You et al., 2019, 第 4.3 节),并训练一个分类器来区分源数据和目标数据。该分类器的输出用于近似在算法 1 中表示的密度比 β。总体而言,为了计算我们表中的结果,我们在大约 1500 GPU/小时的时间框架内训练了 16680 个模型,使用的计算资源为 NVIDIA P100 16GB GPU。
For example, consider the top plot of Figure 2, where we compare the performance of Algorithm 1 to deep embedded validation (DEV) (You et al., 2019), a heuristics baseline source-only regression (SOR, see Section 5.2) and each individual model in the sequence. The bottom plot shows the the scaled aggregation weights, i.e. how much each individual model contributes to the aggregated prediction of IWA, DEV, and SOR. In this example, the given sequence of models is obtained from applying the algorithm proposed in Shu et al. (2018) with different hyper-parameter choices to the Heterogeneity Human Activity Recognition dataset (Stisen et al., 2015). See Section D. 3 in the Supplementary Material for the exact hyper-parameter values.
例如,考虑图2的顶部图表,我们将算法1的性能与深度嵌入验证(DEV)(You et al., 2019)、启发式基线源仅回归(SOR,见第5.2节)以及序列中的每个单独模型进行比较。底部图表显示了缩放的聚合权重,即每个单独模型对IWA、DEV和SOR的聚合预测的贡献。在这个例子中,给定的模型序列是通过对Heterogeneity Human Activity Recognition数据集(Stisen et al., 2015)应用Shu et al.(2018)提出的算法,并选择不同的超参数得到的。有关确切的超参数值,请参见补充材料中的第D.3节。

5.2 BASELINES
5.2 基线

As representatives for the most prominent methods discussed in Section 1, we compare our method, model aggregation, and, model selection methods with theoretical error guarantees.
作为第1节中讨论的最主要方法的代表,我们将我们的方法、模型聚合和模型选择方法与理论误差保证进行比较。
Heuristic Baselines The first baseline is majority voting on target data (TMV). It aggregates the predictions of all models by counting the overall class predictions and selects the class with the maximum prediction count as ensemble output. In addition, we implement three heuristic baselines which aggregate the vector-valued output, i.e. probabilities, of all classifiers using weights learned via linear regression. The final ensemble prediction is then made by selecting the class with the highest probability. The three heuristic regression baselines differ in the input used for the performed regression. Source-only regression (SOR) trains a regression model on classifier predictions (of the given models) and labels from the source domain only. Target majority voting regression (TMR) uses the same voting procedure as explained above to generate pseudo-labels on the target domain, which are then further used to train a linear regression model. In contrast, target confidence average regression (TCR) selects the highest average class probability over all classifiers to pseudo-label the target samples, which is then used for training the linear regression model.
启发式基线 第一个基线是对目标数据的多数投票(TMV)。它通过计算所有模型的整体类别预测来聚合预测,并选择具有最大预测计数的类别作为集成输出。此外,我们实现了三个启发式基线,这些基线通过线性回归学习的权重聚合所有分类器的向量值输出,即概率。最终的集成预测是通过选择具有最高概率的类别来完成的。这三个启发式回归基线在执行回归时使用的输入不同。仅源回归(SOR)仅在源领域的分类器预测(给定模型的预测)和标签上训练回归模型。目标多数投票回归(TMR)使用上述相同的投票程序在目标领域生成伪标签,然后进一步用于训练线性回归模型。相反,目标置信度平均回归(TCR)选择所有分类器中最高的平均类别概率来对目标样本进行伪标签,然后用于训练线性回归模型。
Baselines with Theoretical Error Guarantees We compare IWA to the model selection methods importance weighted validation (IWV) (Sugiyama et al., 2007) and deep embedded validation (DEV) (You et al., 2019), which select models according to their (importance weighted) target risk. Both methods assume the knowledge of an estimated density ratio between target and source domains. In our experiments we follow Bickel et al. (2007); You et al. (2019) and estimate this ratio, by using a classifier that discriminates between source and target domain (see Supplementary Material Section D for more details).
具有理论误差保证的基线 我们将IWA与模型选择方法进行比较,包括重要性加权验证(IWV)(Sugiyama等,2007)和深度嵌入验证(DEV)(You等,2019),这两种方法根据其(重要性加权的)目标风险选择模型。这两种方法假设知道目标和源领域之间的估计密度比。在我们的实验中,我们遵循Bickel等(2007);You等(2019),通过使用一个能够区分源领域和目标领域的分类器来估计这个比率(有关更多细节,请参见补充材料D节)。

5.3 DATASETS
5.3 数据集

We evaluate the previously mentioned methods according to a diverse set of datasets, including language, image and time-series data. All datasets have a train, evaluation and test split, with results only presented on the held-out test sets. For additional details we refer to Appendix C and D.
我们根据多样化的数据集评估之前提到的方法,包括语言、图像和时间序列数据。所有数据集都有训练、评估和测试的划分,结果仅在保留的测试集上呈现。有关更多详细信息,请参阅附录 C 和 D。
TransformedMoons This specific form of twinning moons is based on Zellinger et al. (2021). The source domain consists of two-dimensional input data points and their transformations to two opposing moon-shaped forms.
TransformedMoons 这种特定形式的双胞胎月亮基于 Zellinger 等人(2021)的研究。源领域由二维输入数据点及其变换为两个相对的月亮形状组成。
MiniDomainNet is a reduced version of DomainNet-2019 (Peng et al., 2019) consisting of six different image domains (Quickdraw, Real, Clipart, Sketch, Infograph, and Painting). In particular,
MiniDomainNet 是 DomainNet-2019(Peng 等人,2019)的简化版本,包含六个不同的图像领域(Quickdraw、Real、Clipart、Sketch、Infograph 和 Painting)。特别地,
MiniDomainNet (Zellinger et al., 2021) reduces the number of classes of DomainNet-2019 to the top-five largest representatives in the training set of each class across all six domains.
MiniDomainNet(Zellinger 等人,2021)将 DomainNet-2019 的类别数量减少到每个类别在所有六个领域中训练集中的前五个最大代表。
AmazonReviews is based on Blitzer et al. (2006) and consists of text reviews from four domains: books, DVDs, electronics, and kitchen appliances. Reviews are encoded in feature vectors of bag-of-words unigrams and bigrams with binary labels indicating the rankings. From the four categories we as target domain.
AmazonReviews 基于 Blitzer 等人(2006)的研究,包含来自四个领域的文本评论:书籍、DVD、电子产品和厨房电器。评论以词袋 unigram 和 bigram 的特征向量进行编码,并带有指示排名的二进制标签。我们将这四个类别作为目标领域。
UCI-HAR The Human Activity Recognition (Anguita et al., 2013) dataset from the UC Irvine Repository contains data from three motion sensors (accelerometer, gyroscope and body-worn categories, namely, walking, walking upstairs, downstairs, standing, sitting, and lying down.
UCI-HAR 人体活动识别(Anguita 等人,2013)数据集来自加州大学欧文分校库,包含来自三个运动传感器(加速度计、陀螺仪和身体佩戴类别)的数据,即步行、上楼、下楼、站立、坐着和躺下。
WISDM (Kwapisz et al., 2011) is a class-imbalanced dataset variant from collected accelerometer sensors, including GPS data, from 29 different subjects which are performing similar activities as in the UCI-HAR dataset.
WISDM(Kwapisz 等人,2011)是一个类别不平衡的数据集变体,来自收集的加速度计传感器,包括来自 29 个不同受试者的 GPS 数据,这些受试者执行与 UCI-HAR 数据集中相似的活动。
HHAR The Heterogeneity Human Activity Recognition (Stisen et al., 2015) dataset investigate sensor-, device- and workload-specific heterogeneities using 36 smartphones and smartwatches, consisting of 13 different device models from four manufacturers.
HHAR 异构人类活动识别 (Stisen et al., 2015) 数据集使用 36 部智能手机和智能手表,研究传感器、设备和工作负载特定的异构性,包含来自四个制造商的 13 种不同设备型号。
Sleep-EDF The Sleep Stage Classification time-series setting aims to classify the electroencephalography (EEG) signals into five stages i.e., Wake (W), Non-Rapid Eye Movement stages (N1, N2, N3), and Rapid Eye Movement (REM). Analogous to Ragab et al. (2022); Eldele et al. (2021), we adopt the Sleep-EDF-20 dataset obtained from PhysioBank (Goldberger et al., 2000), which contains EEG readings from 20 healthy subjects.
Sleep-EDF 睡眠阶段分类时间序列设置旨在将脑电图 (EEG) 信号分类为五个阶段,即清醒 (W)、非快速眼动阶段 (N1, N2, N3) 和快速眼动 (REM)。与 Ragab et al. (2022); Eldele et al. (2021) 类似,我们采用从 PhysioBank (Goldberger et al., 2000) 获得的 Sleep-EDF-20 数据集,该数据集包含 20 名健康受试者的 EEG 读数。
We rely on the AdaTime benchmark suite (Ragab et al., 2022) in most evaluations. The four time-series datasets above are originally included there. We extend AdaTime to support the other discussed datasets as well, and extend its domain adaptation methods.
我们在大多数评估中依赖于 AdaTime 基准套件 (Ragab et al., 2022)。上述四个时间序列数据集最初包含在其中。我们扩展了 AdaTime,以支持其他讨论的数据集,并扩展其领域适应方法。

5.4 RESULTS
5.4 结果

We separate the applied methods into two groups, namely heuristic and methods with theoretical error guarantees. All tables show accuracies of source-only (SO) and target-best (TB) models, where source-only denotes training without domain adaptation and target-best the best performing model obtained among all parameter settings. We highlight in bold the performance of the best performing method with theoretical error guarantees, and in italic the best performing heuristic. See Table 1 for results. Please find the full tables in the Supplementary Material Section D.
我们将应用的方法分为两组,即启发式方法和具有理论误差保证的方法。所有表格显示源仅 (SO) 和目标最佳 (TB) 模型的准确性,其中源仅表示不进行领域适应的训练,目标最佳表示在所有参数设置中获得的最佳表现模型。我们用粗体突出显示具有理论误差保证的最佳表现方法的性能,用斜体表示最佳表现的启发式方法。请参见表 1 以获取结果。完整表格请参见补充材料 D 部分。
Outperformance of theoretically justified methods: On all datasets, our method outperforms IWV and DEV, setting a new state of the art for solving parameter choice issues under theoretical guarantees.
理论上有依据的方法的优越性:在所有数据集上,我们的方法优于 IWV 和 DEV,在理论保证下为解决参数选择问题设定了新的最先进水平。
Outperformance of heuristics: It is interesting to note that each heuristic outperforms IWV and DEV on at least five of seven datasets. Moreover, every heuristic outperforms the (average) target best model (TB) in at least two cases, making it impossible for any model selection method to win in these cases. These facts highlight the quality of the predictions of our chosen heuristics. However, each heuristic is outperformed by our method on at least five of seven datasets.
启发式算法的优越性:有趣的是,每个启发式算法在七个数据集中的至少五个数据集上都优于 IWV 和 DEV。此外,每个启发式算法在至少两个案例中优于(平均)目标最佳模型(TB),这使得任何模型选择方法在这些案例中都无法获胜。这些事实突显了我们选择的启发式算法预测的质量。然而,我们的方法在七个数据集中的至少五个数据集上优于每个启发式算法。
Information in aggregation weights and robustness w.r.t. inaccurate models: It is interesting to observe that, in contrast to the other heuristic aggregation baselines, the aggregation weights c1,,cl of our method tend to be larger for accurate models,see Section D.5. Another result is that our method tends to be less sensitive to a high number of inaccurate models than the baselines, see Section D.6. This serves as another reason for its high empirical performance.
聚合权重中的信息及其对不准确模型的鲁棒性:有趣的是,与其他启发式聚合基线相比,我们方法的聚合权重 c1,,cl 对于准确模型往往较大,见第 D.5 节。另一个结果是,我们的方法对大量不准确模型的敏感性通常低于基线,见第 D.6 节。这为其高实证性能提供了另一个理由。

6 CONCLUSION AND FUTURE WORK
6 结论与未来工作

We present a constructive theory-based method for approaching parameter choice issues in the setting of unsupervised domain adaptation. Its theoretical approach relies on the extension of weighted least squares to vector-valued functions. The resulting aggregation method distinguishes itself by a wide trees and neural networks. A broad empirical comparative study on benchmark datasets for language,
我们提出了一种基于构造理论的方法,以解决无监督领域适应中的参数选择问题。其理论方法依赖于加权最小二乘法扩展到向量值函数。由此产生的聚合方法通过广泛的树和神经网络而独树一帜。对语言的基准数据集进行的广泛实证比较研究,
images, body sensor signals and handy signals, underpins the theory-based optimality claim. It is left for future research to further refine the theory and its estimates, e.g., by exploiting concentration bounds from Gretton et al. (2006) or advanced density ratio estimators from Sugiyama et al. (2012).
图像、身体传感器信号和手持信号,支持了基于理论的最优性主张。未来的研究将进一步完善理论及其估计,例如,通过利用 Gretton 等人(2006)的集中界限或 Sugiyama 等人(2012)的高级密度比估计器。
Table 1: Mean and standard deviation (after ± ) of target classification accuracy on Amazon Reviews, Sleep-EDF, UCI-HAR, HHAR and WISDM datasets over three different random initialization of model weights and several domain adaptation tasks.
表 1:在三次不同的模型权重随机初始化和多个领域适应任务下,Amazon Reviews、Sleep-EDF、UCI-HAR、HHAR 和 WISDM 数据集的目标分类准确率的均值和标准差(见 ±)。
MethodAmazon Reviews
HeuristicTheoretical error guarantees
soTMVTMRTCRSORIWVDEVIWA (ours)TB
HoMM0.769(±0.009)0.777(±0.010)0.778(±0.010)0.778(±0.011)0.777(±0.010)0.765(±0.011)0.766(±0.011)0.778(±0.010)0.769(±0.012)
AdvSKM0.766(±0.012)0.780(±0.0090.779(±0.010)0.779(±0.008)0.778(±0.011)0.769(±0.012)0.766(±0.012)0.780(±0.009)0.770(±0.012)
DIRT0.764(±0.009)0.786(±0.008)0.786(±0.010)0.786(±0.008)0.800(±0.0080.778(±0.022)0.773(±0.056)0.787(±0.008)0.786(±0.009)
DDC0.766(±0.012)0.779(±0.010)0.780(±0.0090.779(±0.010)0.778(±0.010)0.767(±0.017)0.768(±0.011)0.780(±0.010)0.770(±0.013)
CMD0.767(±0.012)0.791(±0.009)0.792(±0.009)0.789(±0.010)0.792(±0.010)0.765(±0.015)0.710(±0.015)0.794(±0.0090.785(±0.009
MMDA0.767(±0.011)0.787(±0.011)0.785(±0.010)0.785i^±0.010j^0.787(±0.012)0.769(±0.011)0.766(±0.010)0.787(±0.0110.782(±0.011)
CoDATS0.766(±0.013)0.795(±0.009)0.793(±0.010)0.794(±0.012)0.799(±0.010)0.779(±0.016)0.773(±0.020)0.796(±0.009)0.791(±0.015)
Deep-Coral0.766(±0.012)0.784(±0.0090.783(±0.009)0.783(±0.009)0.782(±0.009)0.769(±0.016)0.769(±0.037)0.785(±0.009)0.776(±0.013)
CDAN0.767(±0.012)0.788(±0.0100.787(±0.009)0.787(±0.0103)0.787(±0.011)0.775(±0.011)0.776(±0.014)0.788(±0.010)0.777(±0.011)
DANN0.767(±0.012)0.796(±0.010)0.792(±0.010)0.793/±0.0100.80010.776(±0.011)0.778(±0.012)0.797(±0.009)0.798(±0.012)
DSAN0.769(±0.009)0.796(±0.009)0.792(±0.009)0.791(±0.010)0.800(±0.010)0.779(±0.012)0.763(±0.017)0.795(±0.009)0.789(±0.012)
Avg.0.767(±0.011)0.787(±0.009)0.786(±0.010)0.786(±0.010)0.789(±0.010)0.772(±0.014)0.764(±0.019)0.788(±0.009)0.781(±0.012)
方法亚马逊评论
启发式理论误差保证
所以TMVTMRTCRSORIWVDEVIWA(我们的)TB
HoMM0.769(±0.009)0.777(±0.010)0.778(±0.010)0.778(±0.011)0.777(±0.010)0.765(±0.011)0.766(±0.011)0.778(±0.010)0.769(±0.012)
AdvSKM0.766(±0.012)0.780(±0.0090.779(±0.010)0.779(±0.008)0.778(±0.011)0.769(±0.012)0.766(±0.012)0.780(±0.009)0.770(±0.012)
DIRT0.764(±0.009)0.786(±0.008)0.786(±0.010)0.786(±0.008)0.800(±0.0080.778(±0.022)0.773(±0.056)0.787(±0.008)0.786(±0.009)
DDC0.766(±0.012)0.779(±0.010)0.780(±0.0090.779(±0.010)0.778(±0.010)0.767(±0.017)0.768(±0.011)0.780(±0.010)0.770(±0.013)
CMD0.767(±0.012)0.791(±0.009)0.792(±0.009)0.789(±0.010)0.792(±0.010)0.765(±0.015)0.710(±0.015)0.794(±0.0090.785(±0.009
MMDA0.767(±0.011)0.787(±0.011)0.785(±0.010)0.785i^±0.010j^0.787(±0.012)0.769(±0.011)0.766(±0.010)0.787(±0.0110.782(±0.011)
CoDATS0.766(±0.013)0.795(±0.009)0.793(±0.010)0.794(±0.012)0.799(±0.010)0.779(±0.016)0.773(±0.020)0.796(±0.009)0.791(±0.015)
Deep-Coral0.766(±0.012)0.784(±0.0090.783(±0.009)0.783(±0.009)0.782(±0.009)0.769(±0.016)0.769(±0.037)0.785(±0.009)0.776(±0.013)
CDAN0.767(±0.012)0.788(±0.0100.787(±0.009)0.787(±0.0103)0.787(±0.011)0.775(±0.011)0.776(±0.014)0.788(±0.010)0.777(±0.011)
DANN0.767(±0.012)0.796(±0.010)0.792(±0.010)0.793/±0.0100.80010.776(±0.011)0.778(±0.012)0.797(±0.009)0.798(±0.012)
DSAN0.769(±0.009)0.796(±0.009)0.792(±0.009)0.791(±0.010)0.800(±0.010)0.779(±0.012)0.763(±0.017)0.795(±0.009)0.789(±0.012)
平均0.767(±0.011)0.787(±0.009)0.786(±0.010)0.786(±0.010)0.789(±0.010)0.772(±0.014)0.764(±0.019)0.788(±0.009)0.781(±0.012)
MethodSleep-EDF
HeuristicTheoretical error guarantees
soTMVTMRTCRSORIWVDEVIWA (ours)TB
HoMM0.676(±0.036)0.722(±0.017)0.719(±0.023)0.718(±0.021)0.724(±0.032)0.726(±0.046)0.678(±0.035)0.747(±0.025)0.715(±0.047)
AdvSKM0.665(±0.058)0.708(±0.023)0.712(±0.027)0.712(±0.032)0.718(±0.0300.703(±0.0690.692(±0.038)0.722(±0.025)0.706(±0.054)
DIRT0.656(±0.058)0.743(±0.009)0.745(±0.012)0.748(±0.0190.742(±0.031)0.679(±0.038)0.686(±0.066)0.749(±0.010)0.728(±0.037)
DDC0.646(±0.035)0.717(±0.029)0.721(±0.0370.712(±0.031)0.695(±0.020)0.694(±0.0560.666(±0.0310.724(±0.012)0.704(±0.031)
CMD0.653(±0.057)0.740(±0.0220.736(±0.016)0.723(±0.020)0.709(±0.015)0.716(±0.052)0.640(±0.068)0.729(±0.0180.725(±0.053)
MMDA0.650(±0.051)0.736(±0.0140.727(±0.021)0.723(±0.018)0.714(±0.028’0.704(±0.033)0.660(±0.034)0.745(±0.031)0.715(±0.042)
CoDATS0.672(±0.084)0.738(±0.029)0.739(±0.0360.736(±0.030)0.723(±0.039)0.683(±0.090)0.690(±0.107)0.744(±0.0210.715(±0.045)
Deep-Coral0.643(±0.049)0.716(±0.018)0.717(±0.0280.712(±0.027)0.694(±0.032)0.700(±0.053)0.675(±0.077)0.713(±0.021)0.702(±0.070)
CDAN0.652(±0.056)0.732(±0.016)0.739(±0.024)0.739(±0.018)0.728(±0.029)0.697(±0.031)0.642(±0.065)0.748(±0.0190.713(±0.045)
DANN0.641(±0.047)0.722(±0.017)0.723(±0.0260.721(±0.025)0.714(±0.024)0.687(±0.034)0.644(±0.046’0.724(±0.018)0.710(±0.035
DSAN0.653(±0.060)0.748(±0.008)0.740(±0.016)0.732(±0.016)0.728(±0.026)0.712(±0.070)0.589(±0.063)0.757(±0.016)0.700(±0.033)
Avg.0.655(±0.054)0.729(±0.018)0.729(±0.024)0.725(±0.023)0.717(±0.028)0.700(±0.052)0.660(±0.057)0.737(±0.020)0.712(±0.045)
方法Sleep-EDF
启发式理论误差保证
所以TMVTMRTCRSORIWVDEVIWA (我们的)TB
HoMM0.676(±0.036)0.722(±0.017)0.719(±0.023)0.718(±0.021)0.724(±0.032)0.726(±0.046)0.678(±0.035)0.747(±0.025)0.715(±0.047)
AdvSKM0.665(±0.058)0.708(±0.023)0.712(±0.027)0.712(±0.032)0.718(±0.0300.703(±0.0690.692(±0.038)0.722(±0.025)0.706(±0.054)
DIRT0.656(±0.058)0.743(±0.009)0.745(±0.012)0.748(±0.0190.742(±0.031)0.679(±0.038)0.686(±0.066)0.749(±0.010)0.728(±0.037)
DDC0.646(±0.035)0.717(±0.029)0.721(±0.0370.712(±0.031)0.695(±0.020)0.694(±0.0560.666(±0.0310.724(±0.012)0.704(±0.031)
CMD0.653(±0.057)0.740(±0.0220.736(±0.016)0.723(±0.020)0.709(±0.015)0.716(±0.052)0.640(±0.068)0.729(±0.0180.725(±0.053)
MMDA0.650(±0.051)0.736(±0.0140.727(±0.021)0.723(±0.018)0.714(±0.028’0.704(±0.033)0.660(±0.034)0.745(±0.031)0.715(±0.042)
CoDATS0.672(±0.084)0.738(±0.029)0.739(±0.0360.736(±0.030)0.723(±0.039)0.683(±0.090)0.690(±0.107)0.744(±0.0210.715(±0.045)
Deep-Coral0.643(±0.049)0.716(±0.018)0.717(±0.0280.712(±0.027)0.694(±0.032)0.700(±0.053)0.675(±0.077)0.713(±0.021)0.702(±0.070)
CDAN0.652(±0.056)0.732(±0.016)0.739(±0.024)0.739(±0.018)0.728(±0.029)0.697(±0.031)0.642(±0.065)0.748(±0.0190.713(±0.045)
DANN0.641(±0.047)0.722(±0.017)0.723(±0.0260.721(±0.025)0.714(±0.024)0.687(±0.034)0.644(±0.046’0.724(±0.018)0.710(±0.035
DSAN0.653(±0.060)0.748(±0.008)0.740(±0.016)0.732(±0.016)0.728(±0.026)0.712(±0.070)0.589(±0.063)0.757(±0.016)0.700(±0.033)
平均值0.655(±0.054)0.729(±0.018)0.729(±0.024)0.725(±0.023)0.717(±0.028)0.700(±0.052)0.660(±0.057)0.737(±0.020)0.712(±0.045)
MethodUCI-HAR
HeuristicTheoretical error guarantees
soTMVTMRTCRSORIWVDEVIWA (ours)TB
HoMM0.782(±0.078)0.833(±0.020)0.818(±0.023)0.818(±0.022)0.783(±0.040)0.809(±0.095)0.800(±0.098)0.826(±0.010)0.854(±0.039)
AdvSKM0.724(±0.059)0.791(±0.024)0.800(±0.022)0.810(±0.0220.768(±0.042)0.707(±0.100)0.711(±0.167)0.800(±0.022)0.811(±0.039)
DIRT0.783(±0.044)0.912(±0.013)0.907(±0.009)0.890(±0.016)0.756(±0.036)0.807(±0.107)0.808(±0.112)0.900(±0.0150.928(±0.034)
DDC0.790(±0.061)0.806(±0.019)0.807(±0.026)0.810(±0.0170.756(±0.108)0.724(±0.066)0.734(±0.109)0.804(±0.028)0.792(±0.013)
CMD0.788(±0.058)0.869(±0.012)0.849(±0.014)0.839(±0.023)0.731(±0.066)0.804(±0.064)0.812(±0.080)0.842(±0.0250.888(±0.037)
MMDA0.785(±0.018)0.819(±0.022)0.812(±0.028)0.800(±0.032)0.759(±0.085)0.773(±0.073)0.767(±0.107)0.807(±0.025)0.840(±0.055)
CoDATS0.760(±0.037)0.854(±0.022)0.832(±0.027)0.832(±0.006)0.785(±0.057)0.801(±0.079)0.794(±0.078)0.846(±0.016)0.867(±0.012)
Deep-Coral0.790(±0.035)0.810(±0.0070.800(±0.022)0.808(±0.030)0.771(±0.023)0.768(±0.044)0.773(±0.087)0.808(±0.016)0.806(±0.022)
CDAN0.756(±0.055)0.842(±0.009)0.843(±0.020)0.840(±0.034)0.802(±0.080)0.781(±0.072)0.687(±0.068)0.846(±0.018)0.853(±0.026)
DANN0.756(±0.026)0.858(±0.0160.856(±0.033)0.856(±0.033)0.800(±0.057)0.763(±0.032)0.780(±0.043)0.849(±0.023)0.847(±0.007)
DSAN0.762(±0.032)0.849(±0.023)0.843(±0.033)0.854(±0.0250.749(±0.065)0.775(±0.043)0.744(±0.035)0.858(±0.023)0.865(±0.038)
Avg.0.770(±0.046)0.840(±0.017)0.833(±0.023)0.832(±0.024)0.769(±0.060)0.774(±0.070)0.765(±0.090)0.835(±0.020)0.850(±0.029)
方法UCI-HAR
启发式理论误差保证
所以TMVTMRTCRSORIWVDEVIWA(我们的)TB
HoMM0.782(±0.078)0.833(±0.020)0.818(±0.023)0.818(±0.022)0.783(±0.040)0.809(±0.095)0.800(±0.098)0.826(±0.010)0.854(±0.039)
AdvSKM0.724(±0.059)0.791(±0.024)0.800(±0.022)0.810(±0.0220.768(±0.042)0.707(±0.100)0.711(±0.167)0.800(±0.022)0.811(±0.039)
DIRT0.783(±0.044)0.912(±0.013)0.907(±0.009)0.890(±0.016)0.756(±0.036)0.807(±0.107)0.808(±0.112)0.900(±0.0150.928(±0.034)
DDC0.790(±0.061)0.806(±0.019)0.807(±0.026)0.810(±0.0170.756(±0.108)0.724(±0.066)0.734(±0.109)0.804(±0.028)0.792(±0.013)
CMD0.788(±0.058)0.869(±0.012)0.849(±0.014)0.839(±0.023)0.731(±0.066)0.804(±0.064)0.812(±0.080)0.842(±0.0250.888(±0.037)
MMDA0.785(±0.018)0.819(±0.022)0.812(±0.028)0.800(±0.032)0.759(±0.085)0.773(±0.073)0.767(±0.107)0.807(±0.025)0.840(±0.055)
CoDATS0.760(±0.037)0.854(±0.022)0.832(±0.027)0.832(±0.006)0.785(±0.057)0.801(±0.079)0.794(±0.078)0.846(±0.016)0.867(±0.012)
Deep-Coral0.790(±0.035)0.810(±0.0070.800(±0.022)0.808(±0.030)0.771(±0.023)0.768(±0.044)0.773(±0.087)0.808(±0.016)0.806(±0.022)
CDAN0.756(±0.055)0.842(±0.009)0.843(±0.020)0.840(±0.034)0.802(±0.080)0.781(±0.072)0.687(±0.068)0.846(±0.018)0.853(±0.026)
DANN0.756(±0.026)0.858(±0.0160.856(±0.033)0.856(±0.033)0.800(±0.057)0.763(±0.032)0.780(±0.043)0.849(±0.023)0.847(±0.007)
DSAN0.762(±0.032)0.849(±0.023)0.843(±0.033)0.854(±0.0250.749(±0.065)0.775(±0.043)0.744(±0.035)0.858(±0.023)0.865(±0.038)
平均0.770(±0.046)0.840(±0.017)0.833(±0.023)0.832(±0.024)0.769(±0.060)0.774(±0.070)0.765(±0.090)0.835(±0.020)0.850(±0.029)
MethodHHAR
Heuristic│Theoretical error guarantees
soTMVTMRTCRSORIWVDEVIWA (ours)TB
HoMM0.739(±0.044)0.757(±0.014)0.759(±0.013)0.759(±0.011)0.700(±0.058)0.720(±0.027)0.733(±0.031)0.759(±0.007)0.764(±0.023)
AdvSKM0.718(±0.042)0.749(±0.027)0.742(±0.032)0.748(±0.034)0.676(±0.046)0.730(±0.051)0.728(±0.051)0.752(±0.031)0.749(±0.025)
DIRT0.728(±0.026)0.803(±0.0110.792(±0.016)0.803(±0.017)0.796(±0.066)0.743(±0.028)0.739(±0.075)0.816(±0.008)0.820(±0.015)
DDC0.716(±0.063)0.748(±0.014)0.750(±0.0090.748(±0.007)0.717(±0.075)0.711(±0.048)0.705(±0.066)0.748(±0.012)0.729(±0.027)
CMD0.748(±0.027)0.760(±0.014)0.764(±0.006)0.767(±0.0070.737(±0.100)0.775(±0.0310.643(±0.031)0.766(±0.016)0.794(±0.023)
MMDA0.738(±0.036)0.783(±0.0170.781(±0.016)0.780(±0.015)0.698(±0.038)0.719(±0.036)0.731(±0.047)0.780(±0.017)0.785(±0.035)
CoDATS0.710(±0.030)0.766(±0.023)0.772(±0.040)0.773(±0.0500.722(±0.064)0.739(±0.028)0.739(±0.040)0.812(±0.009)0.785i^±0.039
Deep-Coral0.745(±0.046)0.766(+0.0120.762(±0.015)0.766(±0.027)0.681(±0.073)0.754(±0.054)0.758(±0.244)0.764(±0.006)0.776(±0.023)
CDAN0.728(±0.039)0.762(±0.012)0.758(±0.017)0.764(±0.016)0.765(±0.063)0.774(±0.035)0.775(±0.036)0.816(±0.0110.790(±0.038)
DANN0.757(±0.057)0.779(±0.012)0.774(±0.0093)0.773(±0.011)0.722(±0.103)0.798(±0.041)0.793(±0.045)0.818(±0.0090.807(±0.020)
DSAN0.721(±0.053)0.803(±0.010)0.797(±0.014)0.802(±0.007)0.724(±0.065)0.741(±0.033)0.596(±0.031)0.825(±0.008)0.826i^±0.046j^
Avg.0.732(±0.042)0.771(±0.015)0.768(±0.017)0.771(±0.018)0.722(±0.068)0.746(±0.037)0.722(±0.063)0.787(±0.012)0.784(±0.028)
方法HHAR
启发式│理论误差保证
所以TMVTMRTCRSORIWVDEVIWA(我们的)TB
HoMM0.739(±0.044)0.757(±0.014)0.759(±0.013)0.759(±0.011)0.700(±0.058)0.720(±0.027)0.733(±0.031)0.759(±0.007)0.764(±0.023)
AdvSKM0.718(±0.042)0.749(±0.027)0.742(±0.032)0.748(±0.034)0.676(±0.046)0.730(±0.051)0.728(±0.051)0.752(±0.031)0.749(±0.025)
DIRT0.728(±0.026)0.803(±0.0110.792(±0.016)0.803(±0.017)0.796(±0.066)0.743(±0.028)0.739(±0.075)0.816(±0.008)0.820(±0.015)
DDC0.716(±0.063)0.748(±0.014)0.750(±0.0090.748(±0.007)0.717(±0.075)0.711(±0.048)0.705(±0.066)0.748(±0.012)0.729(±0.027)
CMD0.748(±0.027)0.760(±0.014)0.764(±0.006)0.767(±0.0070.737(±0.100)0.775(±0.0310.643(±0.031)0.766(±0.016)0.794(±0.023)
MMDA0.738(±0.036)0.783(±0.0170.781(±0.016)0.780(±0.015)0.698(±0.038)0.719(±0.036)0.731(±0.047)0.780(±0.017)0.785(±0.035)
CoDATS0.710(±0.030)0.766(±0.023)0.772(±0.040)0.773(±0.0500.722(±0.064)0.739(±0.028)0.739(±0.040)0.812(±0.009)0.785i^±0.039
Deep-Coral0.745(±0.046)0.766(+0.0120.762(±0.015)0.766(±0.027)0.681(±0.073)0.754(±0.054)0.758(±0.244)0.764(±0.006)0.776(±0.023)
CDAN0.728(±0.039)0.762(±0.012)0.758(±0.017)0.764(±0.016)0.765(±0.063)0.774(±0.035)0.775(±0.036)0.816(±0.0110.790(±0.038)
DANN0.757(±0.057)0.779(±0.012)0.774(±0.0093)0.773(±0.011)0.722(±0.103)0.798(±0.041)0.793(±0.045)0.818(±0.009)0.807(±0.020)
DSAN0.721(±0.053)0.803(±0.010)0.797(±0.014)0.802(±0.007)0.724(±0.065)0.741(±0.033)0.596(±0.031)0.825(±0.008)0.826i^±0.046j^
平均0.732(±0.042)0.771(±0.015)0.768(±0.017)0.771(±0.018)0.722(±0.068)0.746(±0.037)0.722(±0.063)0.787(±0.012)0.784(±0.028)
MethodWISDM
HeuristicTheoretical error guarantees
soTMVTMRTCRSORIWVDEVIWA (ours)TB
HoMM0.753(±0.054)0.741(±0.026)0.738(±0.031)0.739(±0.047)0.775(±0.062)0.753(±0.054)0.740(±0.054)0.728(±0.021)0.774(±0.037)
AdvSKM0.747(±0.050)0.771(±0.043)0.781(±0.055)0.779(±0.035)0.742(±0.062)0.747(±0.050)0.747(±0.135)0.777(±0.031)0.779(±0.041)
DIRT0.738(±0.038)0.792(±0.015)0.797(±0.0240.797(±0.037)0.756(±0.071)0.738(±0.059)0.797(±0.059)0.798(±0.018)0.816(±0.063)
DDC0.741(±0.071)0.780(±0.032)0.779(±0.052)0.787(±0.049)0.737(±0.071)0.741(±0.076)0.741(±0.0630.782(±0.038)0.770(±0.060)
CMD0.710(±0.088)0.772(±0.0210.765(±0.032)0.767(±0.040)0.728(±0.092)0.713(±0.084)0.686(±0.113)0.773(±0.032)0.742(±0.071)
MMDA0.759(±0.047)0.789(±0.017)0.772(±0.030)0.745(±0.035)0.754(±0.050)0.759(±0.047)0.750(±0.047)0.790(±0.018)0.775(±0.030)
CoDATS0.711(±0.039)0.775(±0.0180.757(±0.027)0.751(±0.020)0.682(±0.057)0.709(±0.039)0.735(±0.054)0.764(±0.015)0.770(±0.019)
Deep-Coral0.694(±0.030)0.717(±0.041)0.723(±0.0370.713(±0.035)0.664(±0.055)0.694(±0.030)0.670(±0.149)0.723(±0.026)0.736(±0.044)
CDAN0.760(±0.057)0.762(±0.048)0.762(±0.046)0.781(±0.0510.768(±0.074)0.760(±0.057)0.750(±0.091)0.765(±0.040)0.779(±0.049)
DANN0.724(±0.042)0.789(±0.018)0.802(±0.028)0.796(±0.036)0.745(±0.026)0.720(±0.042)0.702(±0.053)0.778(±0.019)0.765(±0.044)
DSAN0.759(±0.030)0.765(±0.024)0.769(±0.034)0.756(±0.025)0.757(±0.068)0.759(±0.030)0.663(±0.025)0.722(±0.013)0.779(±0.044)
Avg.0.736(±0.050)0.768(±0.027)0.768(±0.036)0.765(±0.037)0.737(±0.062)0.736(±0.052)0.726(±0.077)0.764(±0.025)0.771(±0.046)
方法WISDM
启发式理论误差保证
所以TMVTMRTCRSORIWVDEVIWA(我们的)TB
HoMM0.753(±0.054)0.741(±0.026)0.738(±0.031)0.739(±0.047)0.775(±0.062)0.753(±0.054)0.740(±0.054)0.728(±0.021)0.774(±0.037)
AdvSKM0.747(±0.050)0.771(±0.043)0.781(±0.055)0.779(±0.035)0.742(±0.062)0.747(±0.050)0.747(±0.135)0.777(±0.031)0.779(±0.041)
DIRT0.738(±0.038)0.792(±0.015)0.797(±0.0240.797(±0.037)0.756(±0.071)0.738(±0.059)0.797(±0.059)0.798(±0.018)0.816(±0.063)
DDC0.741(±0.071)0.780(±0.032)0.779(±0.052)0.787(±0.049)0.737(±0.071)0.741(±0.076)0.741(±0.0630.782(±0.038)0.770(±0.060)
CMD0.710(±0.088)0.772(±0.0210.765(±0.032)0.767(±0.040)0.728(±0.092)0.713(±0.084)0.686(±0.113)0.773(±0.032)0.742(±0.071)
MMDA0.759(±0.047)0.789(±0.017)0.772(±0.030)0.745(±0.035)0.754(±0.050)0.759(±0.047)0.750(±0.047)0.790(±0.018)0.775(±0.030)
CoDATS0.711(±0.039)0.775(±0.0180.757(±0.027)0.751(±0.020)0.682(±0.057)0.709(±0.039)0.735(±0.054)0.764(±0.015)0.770(±0.019)
Deep-Coral0.694(±0.030)0.717(±0.041)0.723(±0.0370.713(±0.035)0.664(±0.055)0.694(±0.030)0.670(±0.149)0.723(±0.026)0.736(±0.044)
CDAN0.760(±0.057)0.762(±0.048)0.762(±0.046)0.781(±0.0510.768(±0.074)0.760(±0.057)0.750(±0.091)0.765(±0.040)0.779(±0.049)
DANN0.724(±0.042)0.789(±0.018)0.802(±0.028)0.796(±0.036)0.745(±0.026)0.720(±0.042)0.702(±0.053)0.778(±0.019)0.765(±0.044)
DSAN0.759(±0.030)0.765(±0.024)0.769(±0.034)0.756(±0.025)0.757(±0.068)0.759(±0.030)0.663(±0.025)0.722(±0.013)0.779(±0.044)
平均值0.736(±0.050)0.768(±0.027)0.768(±0.036)0.765(±0.037)0.737(±0.062)0.736(±0.052)0.726(±0.077)0.764(±0.025)0.771(±0.046)

ACKNOWLEDGMENTS
致谢

The ELLIS Unit Linz, the LIT AI Lab, and the Institute for Machine Learning are supported by the Federal State Upper Austria. IARAI is supported by Here Technologies. We thank the projects AI-MOTION (LIT-2018-6-YOU-212), AI-SNN (LIT-2018-6-YOU-214), DeepFlood (LIT-2019-8-YOU- 213), Medical Cognitive Computing Center (MC3), INCONTROL-RL (FFG-881064), PRIMAL (FFG-873979), S3AI (FFG-872172), DL for GranularFlow (FFG-871302), AIRI FG 9-N (FWF- 36284, FWF-36235), and ELISE (H2020-ICT-2019-3 ID: 951847). We further thank AudiJKU GmbH, Anyline GmbH, Google, ZF Friedrichshafen AG, Robert Bosch GmbH, UCB Biopharma SRL, Merck Healthcare KGaA, Verbund AG, TÜV Austria, Frauscher Sensonic, and the NVIDIA Corporation. The research reported in this paper has been funded by the Federal Ministry for Climate Action, Environment, Energy, Mobility, Innovation and Technology (BMK), the Federal Ministry for Digital and Economic Affairs (BMDW), and the Province of Upper Austria in the frame of the COMET-Competence Centers for Excellent Technologies Programme and the COMET Module S3AI managed by the Austrian Research Promotion Agency FFG.
ELLIS 单元林茨、LIT AI 实验室和机器学习研究所得到了上奥地利联邦州的支持。IARAI 得到了 Here Technologies 的支持。我们感谢项目 AI-MOTION (LIT-2018-6-YOU-212)、AI-SNN (LIT-2018-6-YOU-214)、DeepFlood (LIT-2019-8-YOU-213)、医疗认知计算中心 (MC3)、INCONTROL-RL (FFG-881064)、PRIMAL (FFG-873979)、S3AI (FFG-872172)、DL for GranularFlow (FFG-871302)、AIRI FG 9-N (FWF-36284, FWF-36235) 和 ELISE (H2020-ICT-2019-3 ID: 951847)。我们还感谢 AudiJKU GmbH、Anyline GmbH、Google、ZF Friedrichshafen AG、Robert Bosch GmbH、UCB Biopharma SRL、Merck Healthcare KGaA、Verbund AG、TÜV Austria、Frauscher Sensonic 和 NVIDIA Corporation。本文所报告的研究得到了气候行动、环境、能源、交通、创新和技术联邦部 (BMK)、数字和经济事务联邦部 (BMDW) 以及上奥地利省的资助,属于 COMET-卓越技术中心计划和由奥地利研究促进局 FFG 管理的 COMET 模块 S3AI。

REFERENCES
参考文献

M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Corrado, A. Davis, J. Dean, M. Devin, S. Ghemawat, I. Goodfellow, A. Harp, G. Irving, M. Isard, Y. Jia, R. Jozefowicz, J. Shlens, B. Steiner, I. Sutskever, K. Talwar, P. Tucker, V. Vanhoucke, V. Vasudevan, F. Viégas, O. Vinyals, P. Warden, M. Wattenberg, M. Wicke, Y. Yu, and X. Zheng. TensorFlow: Large-scale machine learning on heterogeneous systems, 2015. Software available from tensorflow.org.
W. Ahmed, P. Morerio, and V. Murino. Cleaning noisy labels by negative ensemble learning for source-free unsupervised domain adaptation. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 1616-1625, 2022.
S. Al-Stouhi and C. K. Reddy. Adaptive boosting for transfer learning using dynamic updates. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pp. 60-75. Springer, 2011.
D. Anguita, A. Ghio, L. Oneto, X. Parra, and J. L. Reyes-Ortiz. A public domain dataset for human activity recognition using smartphones. European Symposium on Artificial Neural Networks, pp. 437-442, 2013.
B. Athiwaratkun, M. Finzi, P. Izmailov, and A. G. Wilson. There are many consistent explanations of unlabeled data: Why you should average. International Conference on Learning Representations (2019), 2019.
S. Ben-David, J. Blitzer, K. Crammer, and F. Pereira. Analysis of representations for domain adaptation. In Advances in Neural Information Processing Systems, pp. 137-144, 2007.
S. Ben-David, J. Blitzer, K. Crammer, A. Kulesza, F. Pereira, and J. W. Vaughan. A theory of learning from different domains. Machine Learning, 79(1-2):151-175, 2010.
S. Bickel, M. Brückner, and T. Scheffer. Discriminative learning for differing training and test distributions. In Proceedings of the 24th international conference on Machine learning, pp. 81-88, 2007.
A. Bietti and J. Mairal. Invariance and stability of deep convolutional representations. Advances in neural information processing systems, 30, 2017.
A. Bietti and J. Mairal. Group invariance, stability to deformations, and complexity of deep convolutional representations. The Journal of Machine Learning Research, 20(1):876-924, 2019.
L. Biewald. Experiment tracking with weights and biases, 2020. URL https://www.wandb com/. Software available from wandb.com.
J. Blitzer, R. McDonald, and F. Pereira. Domain adaptation with structural correspondence learning. In Proceedings of the 2006 conference on empirical methods in natural language processing, pp. 120-128, 2006.
L. Breiman. Bagging predictors. Technical Report 421, Department of Statistics, UC Berkeley, 1994.
L. Breiman. Bagging predictors. Machine Learning, 26(2):123-140, 1996a.
L. Breiman. Stacked regressions. Machine Learning, 24(1):49-64, 1996b.
L. Breiman. Arcing classifier (with discussion and a rejoinder by the author). The Annals of Statistics,
A. Caponnetto and E. De Vito. Risk bounds for regularized least-squares algorithm with operatorval-ued kernels. Technical report, CBCL paper 249/CSAIL-TR-2005-031, MIT, 2005.
A. Caponnetto and E. De Vito. Optimal rates for the regularized least-squares algorithm. Foundations of Computational Mathematics, 7(3):331-368, 2007.
C. Chen, Z. Fu, Z. Chen, S. Jin, Z. Cheng, X. Jin, and X.-S. Hua. Homm: Higher-order moment matching for unsupervised domain adaptation. Association for the Advancement of Artificial Intelligence (AAAI), 2020.
M. Chen, Z. Xu, K. Weinberger, and F. Sha. Marginalized denoising autoencoders for domain adaptation. Proceedings of the International Conference on Machine Learning, pp. 767-774, 2012.
F. Cucker and S. Smale. On the mathematical foundations of learning. Bulletin of the American mathematical society, 39(1):1-49, 2002.
C. O. da Costa-Luis. Tqdm: A fast, extensible progress meter for python and cli. Journal of Open Source Software, 4(37):1277, 2019.
W. Dai, Q. Yang, G. R. Xue, and Y. Yu. Boosting for transfer learning. In Proceedings of the 24th International Conference on Machine Learning, pp. 193-200, 2007.
X. Dong, Z. Yu, W. Cao, Y. Shi, and Q. Ma. A survey on ensemble learning. Frontiers of Computer Science, 14(2):241-258, 2020.
L. Duan, I. W. Tsang, and D. Xu. Domain transfer multiple kernel learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(3):465-479, 2012.
R. M. Dudley. Real analysis and probability, volume 74. Cambridge University Press, 2002.
E. Eldele, Z. Chen, C. Liu, M. Wu, C.-K. Kwoh, X. Li, and C. Guan. An attention-based deep learning approach for sleep stage classification with single-channel eeg. IEEE Transactions on Neural Systems and Rehabilitation Engineering, 2021.
A. Fermanian, P. Marion, J. P. Vert, and G. Biau. Framing rnn as a kernel method: A neural ode approach. Advances in Neural Information Processing Systems, 34, 2021.
G. French, M. Mackiewicz, and M. Fisher. Self-ensembling for visual domain adaptation. International Conference on Learning Representations, 2018.
Y. Ganin, E. Ustinova, H. Ajakan, P. Germain, H. Larochelle, F. Laviolette, M. Marchand, and V. Lempitsky. Domain-adversarial training of neural networks. Journal of Machine Learning Research, 17(Jan):1-35, 2016.
E. R. Gizewski, L. Mayer, B. A. Moser, D. H. Nguyen, S. Pereverzyev Jr, S. V. Pereverzyev, N. Shepeleva, and W. Zellinger. On a regularization of unsupervised domain adaptation in RKHS. Applied and Computational Harmonic Analysis, 57:201-227, 2022.
A. L. Goldberger, L. A. N. Amaral, L. Glass, J. M. Hausdorff, P. C. Ivanov, R. G. Mark, J. E. Mietus, G. B. Moody, C.-K. Peng, and H. E. Stanley. Physiobank, physiotoolkit, and physionet components of a new research resource for complex physiologic signals. Circulation, 101(23):215-220, 2000.
I. Goodfellow, Y. Bengio, and A. Courville. Deep learning. MIT press, 2016.
I. Goodfellow, Y. Bengio 和 A. Courville. 深度学习. MIT 出版社, 2016.
A. Gretton, K. M. Borgwardt, M. Rasch, B. Schölkopf, and A. J. Smola. A kernel method for the two-sample-problem. In Advances in Neural Information Processing Systems, pp. 513-520, 2006.
A. Gretton, K. M. Borgwardt, M. Rasch, B. Schölkopf, 和 A. J. Smola. 一种用于两样本问题的核方法. 载于《神经信息处理系统进展》,第 513-520 页,2006 年。
J. M. Ha and B. D. Youn. A health data map-based ensemble of deep domain adaptation under inhomogeneous operating conditions for fault diagnosis of a planetary gearbox. IEEE Access, 9: 79118-79127, 2021.
J. M. Ha 和 B. D. Youn. 基于健康数据地图的深度领域适应集成方法,适用于行星齿轮箱故障诊断的非均匀操作条件. IEEE Access, 9: 79118-79127, 2021 年。
K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770-778, 2016.
K. He, X. Zhang, S. Ren, 和 J. Sun. 用于图像识别的深度残差学习. 载于《IEEE计算机视觉与模式识别会议论文集》,第 770-778 页,2016 年。
S. Hochreiter and J. Schmidhuber. Simplifying neural nets by discovering flat minima. Advances in neural information processing systems, 7, 1994.
S. Hochreiter 和 J. Schmidhuber. 通过发现平坦极小值来简化神经网络. 《神经信息处理系统进展》,第 7 卷,1994 年。
S. Hochreiter and J. Schmidhuber. Flat minima. Neural computation, 9(1):1-42, 1997.
S. Hochreiter 和 J. Schmidhuber. 平坦极小值. 《神经计算》,9(1):1-42, 1997 年。
J. Hoffman, M. Mohri, and N. Zhang. Algorithms and theory for multiple-source adaptation. Advances in Neural Information Processing Systems, 31, 2018.
J. Hoffman, M. Mohri, 和 N. Zhang. 多源适应的算法与理论. 《神经信息处理系统进展》,第 31 卷,2018 年。
J. Huang, A. Gretton, K. Borgwardt, B. Schölkopf, and A. Smola. Correcting sample selection bias by unlabeled data. Advances in neural information processing systems, 19, 2006.
J. Huang, A. Gretton, K. Borgwardt, B. Schölkopf, 和 A. Smola. 通过未标记数据纠正样本选择偏差. 《神经信息处理系统进展》,第 19 卷,2006 年。
H. Daume III and D. Marcu. Domain adaptation for statistical classifiers. Journal of artificial Intelligence research, 26:101-126, 2006.
H. Daume III 和 D. Marcu. 统计分类器的领域适应. 《人工智能研究杂志》,26:101-126, 2006 年。
J. Jiao, M. Zhao, J. Lin, and C. Ding. Classifier inconsistency-based domain adaptation network for partial transfer intelligent diagnosis. IEEE Transactions on Industrial Informatics, 16(9): 5965-5974, 2019.
J. Jiao, M. Zhao, J. Lin, 和 C. Ding. 基于分类器不一致性的领域适应网络用于部分转移智能诊断. IEEE工业信息学汇刊, 16(9): 5965-5974, 2019.
T. Kanamori, S. Hido, and M. Sugiyama. A least-squares approach to direct importance estimation. The Journal of Machine Learning Research, 10:1391-1445, 2009.
T. Kanamori, S. Hido, 和 M. Sugiyama. 一种最小二乘法直接重要性估计的方法. 机器学习研究杂志, 10:1391-1445, 2009.
T. Kanamori, T. Suzuki, and M. Sugiyama. Statistical analysis of kernel-based least-squares density-ratio estimation. Machine Learning, 86(3):335-367, 2012.
T. Kanamori, T. Suzuki, 和 M. Sugiyama. 基于核的最小二乘密度比估计的统计分析. 机器学习, 86(3):335-367, 2012.
G. Kang, L. Jiang, Y. Wei, Y. Yang, and A. G. Hauptmann. Contrastive adaptation network for single-and multi-source domain adaptation. IEEE transactions on pattern analysis and machine intelligence, 2020.
G. Kang, L. Jiang, Y. Wei, Y. Yang, 和 A. G. Hauptmann. 单源和多源领域适应的对比适应网络. IEEE模式分析与机器智能汇刊, 2020.
D. P. Kingma and J. Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980,
D. P. Kingma 和 J. Ba. Adam: 一种随机优化方法. arXiv预印本 arXiv:1412.6980,
2014.
W. M. Kouw, J. H. Kriithe, and M. Loog. Robust importance-weighted cross-validation under sample selection bias. In IEEE International Workshop on Machine Learning for Signal Processing, pp. 1-6. IEEE, 2019.
W. M. Kouw, J. H. Kriithe, 和 M. Loog. 在样本选择偏差下的稳健重要性加权交叉验证. 在IEEE国际信号处理机器学习研讨会上, pp. 1-6. IEEE, 2019.
A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems, pp. 1097-1105, 2012.
A. Krizhevsky, I. Sutskever, 和 G. E. Hinton. 使用深度卷积神经网络进行Imagenet分类. 在神经信息处理系统进展中, pp. 1097-1105, 2012.
J. R. Kwapisz, G. M. Weiss, and S. A. Moore. Activity recognition using cell phone accelerometers. Sigkdd Explorations, 12(2):74-82, 2011.
J. R. Kwapisz, G. M. Weiss, 和 S. A. Moore. 使用手机加速度计进行活动识别. Sigkdd探索, 12(2):74-82, 2011.
S. Laine and T. Aila. Temporal ensembling for semi-supervised learning. International Conference on Learning Representations (ICLR), 2017.
S. Laine 和 T. Aila. 用于半监督学习的时间集成. 国际学习表征会议 (ICLR), 2017.
Q. Liu and H. Xue. Adversarial spectral kernel matching for unsupervised time series domain adaptation. Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI),
Q. Liu 和 H. Xue. 用于无监督时间序列领域适应的对抗谱核匹配. 国际人工智能联合会议论文集 (IJCAI),
30, 2021.
M. Long, Y. Cao, J. Wang, and M. Jordan. Learning transferable features with deep adaptation networks. In Proceedings of the International Conference on Machine Learning, pp. 97-105, 2015.
M. Long, Y. Cao, J. Wang 和 M. Jordan. 使用深度适应网络学习可转移特征. 在国际机器学习会议论文集中, 第 97-105 页, 2015.
M. Long, Z. Cao, J. Wang, and M. I. Jordan. Conditional adversarial domain adaptation. Advances in Neural Information Processing Systems (NeurIPS), 31, 2018.
M. Long, Z. Cao, J. Wang 和 M. I. Jordan. 条件对抗领域适应. 神经信息处理系统进展 (NeurIPS), 31, 2018.
C. Louizos, K. Swersky, Y. Li, M. Welling, and R. Zemel. The variational fair auto encoder. International Conference on Learning Representations, 2016.
C. Ma and L. Wu. The barron space and the flow-induced function spaces for neural network models. Constructive Approximation, 55(1):369-406, 2022.
A. Mayr, G. Klambauer, T. Unterthiner, and S. Hochreiter. Deeptox: toxicity prediction using deep learning. Frontiers in Environmental Science, 3:80, 2016.
K. Musgrave, S. Belongie, and S.-N. Lim. Unsupervised domain adaptation: A reality check. arXiv preprint arXiv:2111.15672, 2021.
D. Nozza, E. Fersini, and E. Messina. Deep learning and ensemble methods for domain adaptation. In IEEE 28th International Conference on Tools with Artificial Intelligence (ICTAI), pp. 184-189. IEEE, 2016.
S. J. Pan and Q. Yang. A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering, 22(10):1345-1359, 2010.
A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, and A. Lerer. Automatic differentiation in pytorch. 2017.
F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Pret-tenhofer, R. Weiss, V. Dubourg, et al. Scikit-learn: Machine learning in python. the Journal of machine Learning research, 12:2825-2830, 2011.
X. Peng, B. Usmanand N. Kaushikand D. Wang, J. Hoffman, and K. Saenko. Visda: A synthetic-to-real benchmark for visual domain adaptation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 2021-2026, 2018.
X. Peng, Q. Bai, X. Xia, Z. Huang, K. Saenko, and B. Wang. Moment matching for multi-source domain adaptation. In Proceedings of the IEEE International Conference on Computer Vision, pp. 1406-1415, 2019.
S. V. Pereverzyev. An Introduction to Artificial Intelligence based on Reproducing Kernel Hilbert Spaces. Birkhäuser Cham, 2022.
I. Pinelis. An approach to inequalities for the distributions of infinite-dimensional martingales. In Probability in Banach Spaces, 8: Proceedings of the Eighth International Conference, pp. 128-134. Springer, 1992.
M. Ragab, E. Eldele, W. L. Tan, C.-S. Foo, Z. Chen, M. Wu, C.-K. Kwoh, and X. Li. Adatime: A benchmarking suite for domain adaptation on time series data. arXiv preprint arXiv:2203.08321, 2022.
M. M. Rahman, C. Fookes, M. Baktashmotlagh, and S. Sridharan. On minimum discrepancy estimation for deep domain adaptation. Domain Adaptation for Visual Understanding, 2020.
S. Rakshit, B. Banerjee, G. Roig, and S. Chaudhuri. Unsupervised multi-source domain adaptation driven by deep adversarial ensemble learning. In German Conference on Pattern Recognition, pp. 485-498. Springer, 2019.
H. Razar and S. Samothrakis. Bagging adversarial neural networks for domain adaptation in nonstationary eeg. In 2019 International Joint Conference on Neural Networks (IJCNN), pp. 1-7. IEEE, 2019.
L. Rosasco, M. Belkin, and E. De Vito. On learning with integral operators. Journal of Machine Learning Research, 11(2), 2010.
K. Saenko, B. Kulis, M. Fritz, and T. Darrell. Adapting visual category models to new domains. In European conference on computer vision, pp. 213-226. Springer, 2010.
K. Saito, Y. Ushiku, and T. Harada. Asymmetric tri-training for unsupervised domain adaptation. In International Conference on Machine Learning, pp. 2988-2997. PMLR, 2017.
K. Saito, D. Kim, P. Teterwak, S. Sclaroff, T. Darrell, and K. Saenko. Tune it the right way: Unsupervised validation of domain adaptation via soft neighborhood density. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9184-9193, 2021.
K. Saito, D. Kim, P. Teterwak, S. Sclaroff, T. Darrell 和 K. Saenko. 以正确的方式调整: 通过软邻域密度进行无监督领域适应验证. 在 IEEE/CVF 国际计算机视觉会议论文集中, 第 9184-9193 页, 2021.
R. E. Schapire. The strength of weak learnability. Machine Learning, 5:197-227, 1990.
R. E. Schapire. 弱学习能力的强度. 机器学习, 5:197-227, 1990.
H. Shimodaira. Improving predictive inference under covariate shift by weighting the log-likelihood function. Journal of Statistical Planning and Inference, 90(2):227-244, 2000.
H. Shimodaira. 通过加权对数似然函数改善协变量偏移下的预测推断. 统计规划与推断杂志, 90(2):227-244, 2000.
C. Shorten and T. M. Khoshgoftaar. A survey on image data augmentation for deep learning. Journal of Big Data, 6(1):1-48, 2019.
C. Shorten 和 T. M. Khoshgoftaar. 深度学习图像数据增强的调查. 大数据杂志, 6(1):1-48, 2019.
R. Shu, H. Bui, H. Narui, and S. Ermon. A dirt-t approach to unsupervised domain adaptation. International Conference on Learning Representations (ICLR), 2018.
R. Shu, H. Bui, H. Narui, 和 S. Ermon. 一种针对无监督领域适应的dirt-t方法. 国际学习表征会议 (ICLR), 2018.
A. Stisen, H. Blunck, S. Bhattacharya, T. S. Prentow, M. B. Kjærgaard, A. Dey, T. Sonne, and M. M. Jensen. Smart devices are different: Assessing and mitigatingmobile sensing heterogeneities for activity recognition. In Proceedings of the 13th ACM Conference on Embedded Networked Sensor Systems, SenSys '15, pp. 127-140, New York, NY, USA, 2015. Association for Computing Machinery. ISBN 9781450336314. doi: 10.1145/2809695.2809718.
A. Stisen, H. Blunck, S. Bhattacharya, T. S. Prentow, M. B. Kjærgaard, A. Dey, T. Sonne, 和 M. M. Jensen. 智能设备是不同的:评估和缓解移动感知异质性以进行活动识别. 在第13届ACM嵌入式网络传感器系统会议论文集,SenSys '15, 第127-140页, 纽约, NY, 美国, 2015. 计算机协会. ISBN 9781450336314. doi: 10.1145/2809695.2809718.
G. Strang. Linear algebra and its applications. Orlando, FL, Academic Press, Inc., 1980.
G. Strang. 线性代数及其应用. 奥兰多, FL, 学术出版社, 1980.
M. Sugiyama, M. Krauledat, and K. M. Müller. Covariate shift adaptation by importance weighted cross validation. Journal of Machine Learning Research, 8(5), 2007.
M. Sugiyama, M. Krauledat, 和 K. M. Müller. 通过重要性加权交叉验证进行协变量偏移适应. 机器学习研究杂志, 8(5), 2007.
M. Sugiyama, T. Suzuki, and T. Kanamori. Density ratio estimation in machine learning. Cambridge University Press, 2012.
M. Sugiyama, T. Suzuki, 和 T. Kanamori. 机器学习中的密度比估计. 剑桥大学出版社, 2012.
B. Sun and K. Saenko. Deep coral: Correlation alignment for deep domain adaptation. In Proceedings of the European Conference on Computer Vision, pp. 443-450, 2016.
B. Sun 和 K. Saenko. 深度珊瑚:用于深度领域适应的相关性对齐. 在欧洲计算机视觉会议论文集, 第443-450页, 2016.
B. Sun, J. Feng, and K. Saenko. Correlation alignment for unsupervised domain adaptation. Domain Adaptation in Computer Vision Applications, pp. 153-171, 2017.
B. Sun, J. Feng, 和 K. Saenko. 用于无监督领域适应的相关性对齐. 计算机视觉应用中的领域适应, 第153-171页, 2017.
W. Tuand S. Sun. Dynamical ensemble learning with model-friendly classifiers for domain adaptation. In Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012), pp.
W. Tu和S. Sun. 使用模型友好的分类器进行动态集成学习以实现领域适应。在第21届国际模式识别会议(ICPR2012)论文集中,页码。
A. Tarvainen and H. Valpola. Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. Advances in neural information processing systems, 30, 2017.
A. Tarvainen和H. Valpola. 平均教师是更好的榜样:权重平均一致性目标改善半监督深度学习结果。神经信息处理系统进展,30,2017。
G. Teschl. Topics in Linear and Nonlinear Functional Analysis. Amer. Math. Soc., Providence, to appear, 2022a.
G. Teschl. 线性与非线性泛函分析中的主题。美国数学学会,普罗维登斯,待出版,2022a。
G. Teschl. Topics in Real Analysis. Amer: Math. Soc., Providence, to appear, 2022b.
G. Teschl. 实分析中的主题。美国数学学会,普罗维登斯,待出版,2022b。
E. Tzeng, J. Hoffman, N. Zhang, K. Saenko, and T. Darrell. Deep domain confusion: Maximizing for
E. Tzeng, J. Hoffman, N. Zhang, K. Saenko和T. Darrell. 深度领域混淆:最大化以
T. Varsavsky, M. Orbes-Arteaga, C. H. Sudre, M. S. Graham, P. Nachev, and M. J. Cardoso. Test-time unsupervised domain adaptation. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 428-436. Springer, 2020.
T. Varsavsky, M. Orbes-Arteaga, C. H. Sudre, M. S. Graham, P. Nachev和M. J. Cardoso. 测试时无监督领域适应。在国际医学图像计算与计算机辅助干预会议论文集中,页码428-436。施普林格,2020。
G. Wilson and D. J. Cook. A survey of unsupervised deep domain adaptation. ACM Transactions on Intelligent Systems and Technology (TIST), 11(5):1-46, 2020.
G. Wilson和D. J. Cook. 无监督深度领域适应的调查。ACM智能系统与技术交易(TIST),11(5):1-46,2020。
G. Wilson, J. R. Doppa, and D. J. Cook. Multi-source deep domain adaptation with weak supervision for time-series sensor data. Special Interest Group on Knowledge Discovery and Data Mining (SIGKDD), 2020.
G. Wilson, J. R. Doppa和D. J. Cook. 针对时间序列传感器数据的多源深度领域适应与弱监督。知识发现与数据挖掘特别兴趣小组(SIGKDD),2020。
D. H. Wolpert. Stacked generalization. Neural Networks, 5:214-259, 1992.
D. H. Wolpert. 堆叠泛化。神经网络,5:214-259,1992。
M. Wortsman, G. Ilharco, S. Y. Gadre, R. Roelofs, R. Gontijo-Lopes, A. S. Morcos, H. Namkoong, A. Farhadi, Y. Carmon, S. Kornblith, and L. Schmidt. Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time. arXiv preprint arXiv:2203.05482, 2022.
R. Xia, C. Zong, X. Hu, and E. Cambria. Feature ensemble plus sample selection: domain adaptation for sentiment classification. IEEE Intelligent Systems, 28(3):10-18, 2013.
R. Xu, Z. Chen, W. Zuo, J. Yan, and L. Lin. Deep cocktail network: Multi-source unsupervised domain adaptation with category shift. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3964-3973, 2018.
J. B. Yang, Q. Mao, Q. L. Xiang, I. W.-H. Tsangand K. M. A. Chai, and H. L. Chieu. Domain adaptation for coreference resolution: An adaptive ensemble approach. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp. 744-753, 2012.
K. You, X. Wang, M. Long, and M. Jordan. Towards accurate model selection in deep unsupervised domain adaptation. In Proceedings of the International Conference on Machine Learning, pp. 7124-7133. PMLR, 2019.
W. Zellinger, T. Grubinger, E. Lughofer, T. Natschläger, and S. Saminger-Platz. Central moment discrepancy (cmd) for domain-invariant representation learning. International Conference on Learning Representations, 2017.
W. Zellinger, T. Grubinger, M. Zwick, E. Lughofer, H. Schöner, T. Natschläger, and S. Saminger-Platz. Multi-source transfer learning of time series in cyclical manufacturing. Journal of Intelligent Manufacturing, 31(3):777-787, 2020.
W. Zellinger, N. Shepeleva, M.-C. Dinu, H. Eghbal-zadeh, H. Nguyen, B. Nessler, S. Pereverzyev, adaptation. Advances in Neural Information Processing Systems, 34, 2021.
K. Zhang, M. Gong, and B. Schölkopf. Multi-source domain adaptation: A causal view. In Twenty-ninth AAAI conference on artificial intelligence, 2015.
K. Zhou, Y. Yang, Y. Qiao, and T. Xiang. Domain adaptive ensemble learning. IEEE Transactions on Image Processing, 30:8008-8018, 2021.
Y. Zhu, F. Zhuang, J. Wang, G. Ke, J. Chen, J. Bian, H. Xiong, and Q. He. Deep subdomain adaptation network for image classification. IEEE Transactions on Neural Networks and Learning Systems, 32(4):1713-1722, 2021.
D. Zou, Q. Zhu, and P. Yan. Unsupervised domain adaptation with dual-scheme fusion network for medical image segmentation. In IJCAI, pp. 3291-3298, 2020.
Y. Zou, Z. Yu, B. V. K. Kumar, and J. Wang. Unsupervised domain adaptation for semantic segmentation via class-balanced self-training. In Proceedings of the European conference on computer vision (ECCV), pp. 289-305, 2018.

A NOTATION AND PROOF OF MAIN RESULT
主要结果的符号和证明

The aim of this section is to give a full proof of our main result, Theorem 1 in the main paper. We start by introducing and summarizing the notation and the required concepts from functional analysis and measure theory, so that we can state and prove the required lemmas.
本节的目的是给出我们主要结果的完整证明,即主论文中的定理 1。我们首先介绍并总结符号和功能分析及测度理论所需的概念,以便我们能够陈述和证明所需的引理。

Summary of Notation
符号总结

  • Spaces: input space XRd1 and label space Y with inner product .,,V.Y is assumed to be a separable Hilbert space such that for the associated norm yVy0 holds for all yY and some y0>0 . Note that this setting is more general than the one from the main text,where we assumed YRd2 (the simplification in the main text improves readability and respectation of space limits).
  • 空间:输入空间 XRd1 和标签空间 Y 的内积 .,,V.Y 假定为可分的希尔伯特空间,对于相关的范数 yVy0 ,对所有 yY 和某些 y0>0 都成立。请注意,这种设置比主文本中的设置更为一般,在主文本中我们假设了 YRd2 (主文本中的简化提高了可读性并尊重空间限制)。
  • Datasets and Distributions: Source data set: (x,y)=((x1,y1),,(xn,yn))(X×Y)n independently drawn according to source distribution p on X×Y and an unlabeled target dataset x=(x1,,xm)Xm independently drawn according marginal distribution qX similarly denoted as pX ).
  • 数据集和分布:源数据集:(x,y)=((x1,y1),,(xn,yn))(X×Y)n 根据源分布 pX×Y 上独立抽取,未标记的目标数据集 x=(x1,,xm)Xm 根据边际分布 qX 独立抽取,类似地表示为 pX
  • Source Risk: Rp(f)=X×Yf(x)yY2dp(x,y) .
  • 源风险:Rp(f)=X×Yf(x)yY2dp(x,y)
  • Source Regression function fp(x)=Vydp(yx) . (Vector valued) integral in the sense of Lebesgue-Bochner.
  • 源回归函数 fp(x)=Vydp(yx)。(向量值)在勒贝格-博赫纳的意义下的积分。
  • Target Risk: Rq(f)=X×Yf(x)yY2dq(x,y)
  • 目标风险:Rq(f)=X×Yf(x)yY2dq(x,y)
  • Target Regression function fq(x)=Vydq(yx) . (Vector valued) integral in the sense of Lebesgue-Bochner.
  • 目标回归函数 fq(x)=Vydq(yx)。(向量值)在勒贝格-博赫纳的意义下的积分。

Problem
问题

  • Given: sequence f1,,fl:XY of models,source sample(x,y)and unlabeled target sample x
  • 给定:模型序列 f1,,fl:XY,源样本 (x,y) 和未标记目标样本 x
  • Aim: find aggretation f=i=1lcifi with minimal Rq(f) . Main Assumptions
  • 目标:找到最小 Rq(f) 的聚合 f=i=1lcifi。主要假设
  • covariate shift: p(yx)=q(yx) and thus fp=fq .
  • 协变量转移: p(yx)=q(yx) 因此 fp=fq
  • bounded density ratio: there is β:X[0,B] such that dqX(x)=β(x)dpX(x) .
  • 有界密度比: 存在 β:X[0,B] 使得 dqX(x)=β(x)dpX(x)
Existence of the associated conditional probability measures is guaranteed by the fact that X×Y is Polish (a separable and complete metric space), c.f. Dudley (2002, Theorem 10.2.2.).
相关条件概率测度的存在性由 X×Y 是波兰空间(可分和完备的度量空间)这一事实所保证,参见 Dudley (2002, 定理 10.2.2.)。
Notation from functional analysis/operator theory Let U and V denote separable Hilbert spaces (i.e. they admit countable orthonormal bases) with associated inner products ,U (or ,V , respectively). Let us briefly recall some notions from functional analysis that we need in order to set up our theory. There are lots of standard references on these aspects, e.g. Teschl (2022a) and Teschl (2022b):
从泛函分析/算子理论的符号 设 UV 表示可分的希尔伯特空间(即它们具有可数的正交基)及其相关的内积 ,U(或 ,V,分别)。我们简要回顾一些我们在建立理论时需要的泛函分析概念。关于这些方面有很多标准参考文献,例如 Teschl (2022a) 和 Teschl (2022b):
  • L(U,V) : space of bounded linear operators UV with uniform norm L(U,V)L(U) : space of bounded linear operators UU .
  • L(U,V) : 有界线性算子 UV 的空间,具有统一范数 L(U,V)L(U) : 有界线性算子 UU 的空间。
  • For AL(U,V) ,its adjoint is denoted by AL(V,U) (and uniquely defined by the equation Au,vV=u,AvU for any uU,vV ).
  • 对于 AL(U,V),其伴随算子用 AL(V,U) 表示(并且由方程 Au,vV=u,AvU 唯一确定,对于任何 uU,vV)。
  • If AL(U) and A=A:A is called self-adjoint.
  • 如果 AL(U)A=A:A 被称为自伴随。
  • If AL(U) is self adjoint and Au,uU0 for any uU ,then A is called positive. Equivalently: there exists (unique) bounded and self-adjoint B:=AL(U) such that B2=A .
  • 如果 AL(U) 是自伴随的,并且对于任何 uU,则 A 被称为正的。等价地:存在(唯一的)有界且自伴随的 B:=AL(U) 使得 B2=A
  • Trace of an operator AL(U):Tr(A)=kAek,ekU for any orthonormal basis (ek)k=1 of U (independent of choice of basis). If Tr(A)<:A is called trace class.
  • 算子 AL(U):Tr(A)=kAek,ekU 的迹,对于 U 的任何正交基 (ek)k=1(与基的选择无关)。如果 Tr(A)<:A 被称为迹类。
  • L2(U) : separable Hilbert space of Hilbert-Schmidt operators on U with scalar product
  • L2(U) : 在 U 上的希尔伯特-施密特算子的可分希尔伯特空间,带有标量积。
A,BL2(U)=Tr(BA)and normAL2(U)=Tr(AA)≥∥AL(U).
  • A:UV is called Hilbert-Schmidt,if AA is trace class. Also here: AL(U,V)
  • A:UV 被称为希尔伯特-施密特,如果 AA 是迹类的。这里还有: AL(U,V)
Tr(AA)
  • For (probability) measure q on X (or Y ) and appropriate functions F:XU (e.g. strongly measurable and FU is integrable wrt. q ) we denote the usual (U-valued) Bochner integral of F as XF(x)dq(x) . We denote the associated Lp -spaces by Lp(X,q,U) ,or Lp(q) for short, if the associated spaces are clear from the context.
  • 对于 qX (或 Y) 上的(概率)测度和适当的函数 F:XU (例如强可测且 FU 关于 q 可积),我们将 F 的通常(U值)博赫纳积分表示为 XF(x)dq(x)。如果相关空间从上下文中可以清楚地看出,我们将相关的 Lp 空间表示为 Lp(X,q,U),或简写为 Lp(q)
Assumptions on models We assume that the regression function f=fp=fq as well as the models f1,,fl belong to a hypothesis space HC(X,Y)L2(pX)L2(qX) ,where C(X,Y) denotes the space of bounded continous functions XY . The space H should satisfy the following assumptions, which are discussed in much greater detail in Caponnetto & De Vito (2007) and Caponnetto & De Vito (2005):
模型假设 我们假设回归函数 f=fp=fq 以及模型 f1,,fl 属于假设空间 HC(X,Y)L2(pX)L2(qX),其中 C(X,Y) 表示有界连续函数的空间 XY。空间 H 应满足以下假设,这些假设在 Caponnetto & De Vito (2007) 和 Caponnetto & De Vito (2005) 中有更详细的讨论:
Hypothesis 1. (Caponnetto & De Vito,2007) The space H is a separable Hilbert space of functions f:XY such that:
假设 1. (Caponnetto & De Vito, 2007) 空间 H 是一个可分的希尔伯特函数空间 f:XY,使得:
  • For all xX there is a Hilbert-Schmidt operator Kx:YH satisfying
  • 对于所有 xX,存在一个希尔伯特-施密特算子 Kx:YH 满足
(5)f(x)=Kxf,fH,
  • The function from X×X to R
  • X×XR 的函数
(6)(x,t)Ktv,KxwHis measurablev,wY;
  • There is κ>0 such that
  • 存在 κ>0 使得
(7)Tr(KxKx)κ,xX.
Moreover we assume that the norms fkH,k=1,2,,l ,are under our control,such that we can put a threshold γl>0 and consider fkHγl .
此外,我们假设范数 fkH,k=1,2,,l 在我们的控制之下,使得我们可以设定一个阈值 γl>0 并考虑 fkHγl
Further useful observations Then we have
进一步有用的观察 然后我们有
(8)KtKx=K(t,x)L2(Y)x,tX.
Given xX the operator
给定 xX,算子
(9)Tx=KxKxL2(H)
is a positive Hilbert-Schmidt operator and (9) ensures
是一个正的希尔伯特-施密特算子,并且 (9) 确保
(10)TxL(H)TxL2(H)=∥K(x,x)L2(Y)κ.
Let Tq,χ:HH be
Tq,χ:HH
TqX=XTxdqX(x)
where the integral converges in L2(H) to a positive trace class operator with
其中积分在 L2(H) 中收敛到一个正的迹类算子,具有
(11)TqXL(H)TqXL2(H)Tr(TqX)=XTr(Tx)dqX(x)κ.
Following Proposition 1 in Caponnetto &De Vito (2007),we have the minimizers fq of expected risk Rq are the solution of the following equation:
根据 Caponnetto & De Vito (2007) 的命题 1,我们有期望风险 Rq 的最小化器 fq 是以下方程的解:
Tqχfq=g,
where
其中
g=XKxfq(x)dqX(x)H
with integral converging in H .
积分在 H 中收敛。
Next we define the operators
接下来我们定义算子
Tx=1mj=1mKxjKxj
Tx,β=1ni=1nβ(xi)KxiKxi,
gx,y,β=1ni=1nβ(xi)Kxiyi
In the sequel we adopt the convention that C denotes a generic positive coefficient,which can vary from appearance to appearance and may only depend on basic parameter such as pX,qX,κ,B,y0 and others introduced below,but not on n,m and error probability δ>0 .
在后续中,我们采用约定 C 表示一个通用的正系数,它可以在不同的出现中变化,并且可能仅依赖于基本参数,如 pX,qX,κ,B,y0 和下面引入的其他参数,但不依赖于 n,m 和错误概率 δ>0
We will need the following statements.
我们将需要以下陈述。
Lemma 1. With probability at least 1δ we have
引理 1. 以至少 1δ 的概率,我们有
(12)TqXTxL(H)TqXTxL2(H)C(log121δ)m12,
(13)TxTx,βL(H)C(log121δ)(n12+m12),
(14)Tx,βfgx,y,βHC(log121δ)n12,
where C>0 does not depend on n,m and δ .
其中 C>0 不依赖于 n,mδ
The proof of Lemma 1 is based on Lemma 4 of Huang et al. (2006), which we formulate in our notations as follows
引理 1 的证明基于 Huang 等人 (2006) 的引理 4,我们将其用我们的符号表示如下
Lemma 2. ((Huang et al.,2006)) Let ϕ be a map from U to U such that ϕ(x)LIR for all xX . Then with probability at least 1δ it holds
引理 2. ((Huang et al.,2006)) 设 ϕ 是从 UU 的映射,使得对于所有 xXϕ(x)LIR 。那么以至少 1δ 的概率成立。
1mj=1mϕ(xj)1ni=1nβ(xi)ϕ(xi)U(1+2log2δ)RB2n+1m.
Moreover, we will need a concentration inequality that follows from Pinelis (1992), see also Rosasco et al. (2010).
此外,我们还需要一个来自 Pinelis (1992) 的集中不等式,参见 Rosasco et al. (2010)。
Lemma 3 (Concentration lemma). If ξ1,ξ2,,ξn are zero mean independent random variables with values in a separable Hilbert space U ,and for some D>0 one has ξiUD,i=1,2,,n , then the following bound
引理 3(集中引理)。如果 ξ1,ξ2,,ξn 是均值为零的独立随机变量,其值在可分离的希尔伯特空间 U 中,并且对于某些 D>0ξiUD,i=1,2,,n ,则以下界限成立。
1ni=1nξiUD2log2δn
holds true with probability at least 1δ .
以至少 1δ 的概率成立。

Proof of Lemma 1.
引理 1 的证明。

Let us start by proving (12) by introducing the map ξ:XL2(H) as ξ(x)=KxKxTqX . From (10) and (11) it follows that
让我们通过引入映射 ξ:XL2(H) 作为 ξ(x)=KxKxTqX 来证明 (12)。从 (10) 和 (11) 可以得出。
ξ(x)L2(H)KxKxL2(H)+TqXL2(H)2κ.
Moreover, we have
此外,我们有。
Xξ(x)dqX(x)=XKxKxdqX(x)TqX=0.
Therefore,for xj,j=1,2,,m ,drawn i.i.d from the marginal probability measure qX ,the corresponding operators ξj=ξ(xj) can be treated as zero mean independent random variables in L2(H) ,such that the condition of Concentration lemma are satisfied with D=2κ ,and
因此,对于从边际概率测度 qX 中独立同分布抽取的 xj,j=1,2,,m ,相应的算子 ξj=ξ(xj) 可以视为在 L2(H) 中的均值为零的独立随机变量,使得集中引理的条件满足 D=2κ ,并且。
TxTqXL2(H)=1mj=1mKxjKxjTqXL2(H)=1mj=1mξjL2(H)2κ2log2δm.
To obtain (13),for any fH we define a map ϕ=ϕf:XH as ϕf(x)=KxKxf . It clear that
为了获得 (13),对于任何 fH ,我们定义映射 ϕ=ϕf:XHϕf(x)=KxKxf 。显然。
ϕf(x)H=KxKxL(H)fHκfH.
Therefore,for the map ϕ=ϕf the condition of the above Lemma 2 is satisfied with R=κfH . Then directly from that lemma for any fH we have
因此,对于映射 ϕ=ϕf ,上述引理 2 的条件满足 R=κfH 。那么直接根据该引理,对于任何 fH 我们有。
TxfTx,βfH=1mj=1mϕf(xj)1ni=1nβ(xi)ϕf(xi)H
(1+2log2δ)(B2n+1m)κfH
C(log121δ)(m12+n12)fH,
that proves (13).
这证明了 (13)。
Consider now the map ϝ:X×YH defined by
现在考虑由 ϝ:X×YH 定义的映射
F(x,y)=β(x)Kx(fp(x)y).
Recall that KxL(Y,H)Tr(KxKx)κ . Then we obtain:
回想一下 KxL(Y,H)Tr(KxKx)κ 。然后我们得到:
ϝ(x,y)HKxL(Y,H)Yydp(yx)yY|β(x)|2y0Bκ.
Moreover,for p(x,y)=p(yx)pX(x) we have
此外,对于 p(x,y)=p(yx)pX(x) 我们有
X×YF(x,y)dp(x,y)=XKxβ(x)Y(Yydp(yx)y)dp(yx)dpX(x)=0,
such that for (xi,yi),i=1,2,,n ,drawn i.i.d from the measure p(x,y) the corresponding values ϝi=ϝ(xi,yi) are zero mean independent random variables in H . Then for the just defined ϝi=β(xi)Kxi(fq(xi)yi) the conditions of Lemma 3 are satisfied with D=2y0Bκ ,such that
使得对于 (xi,yi),i=1,2,,n ,从测度 p(x,y) 中独立同分布抽取的值 ϝi=ϝ(xi,yi) 是在 H 中均值为零的独立随机变量。然后,对于刚定义的 ϝi=β(xi)Kxi(fq(xi)yi) ,满足引理 3 的条件为 D=2y0Bκ ,使得
1ni=1nFiH=1ni=1nβ(xi)Kxi(fq(xi)yi)H
=i=1nβ(xi)KxiKxifqi=1nβ(xi)KxiyiH
=Tx,βfqgx,y,βH2y0Bκ2log2δn.
This bound gives us (14).
这个界限给我们 (14)。
Aggregation for vector-valued functions Next we construct a new approximant in the form of a linear combination of approximants f1,f2,,fl ,computed for all tried parameter values. The linear combination of the approximants is computed as
向量值函数的聚合 接下来我们构造一个新的近似形式,作为所有尝试的参数值计算的近似的线性组合。近似的线性组合计算为
(15)f=k=1lckfk
Since f1,f2,,fl belong to RKHS H ,it is clear that fH . Now we want to argue on how close we can get to fq . Following Proposition 1 in Caponnetto &De Vito (2007),we have
由于 f1,f2,,fl 属于 RKHS H ,显然 fH 。现在我们想讨论我们能多接近 fq 。根据 Caponnetto & De Vito (2007) 的命题 1,我们有
(16)Rq(f)Rq(fq)=ffqL2(qX)2=TqX(ffq)H2.
Next we observe that the best approximation f of the target regression function fq by linear combinations corresponds to the vector c=(c1,,cl) of ideal coefficients in (15) that solves the linear system Gc=g¯ with the Gram matrix G=(TqXfk,TqXfuH)k,u=1l and the right-hand side vector g¯=(TqXfq,TqXfkH)k=1l . Let us provide a prove of this short observation in the next lemma. Note that the entries G and g can equivalently also be formulated in terms of ,L2(gx) , as done in the main text. We are going to use this formulation in the next lemma in order to be compatible with the main text (switching to the inner products in terms of H would not change the argument of the proof at all):
接下来我们观察到,目标回归函数 fq 的最佳近似 f 通过线性组合对应于理想系数向量 c=(c1,,cl) ,该向量在 (15) 中解决了与 Gram 矩阵 G=(TqXfk,TqXfuH)k,u=1l 和右侧向量 g¯=(TqXfq,TqXfkH)k=1l 的线性系统。我们将在下一个引理中提供这个简短观察的证明。注意,条目 Gg 也可以等效地用 ,L2(gx) 表述,如主文本中所做。我们将在下一个引理中使用这种表述,以便与主文本兼容(切换到 H 的内积不会改变证明的论证):
Lemma 4. The best L2(qX) -approximation f of the target regression function fq by linear combinations corresponds to the vector c=(c1,,cl)=G1g¯ .
引理 4. 目标回归函数 fq 的最佳 L2(qX) -近似 f 通过线性组合对应于向量 c=(c1,,cl)=G1g¯
Proof. Let us denote (16) by f(c) and rewrite this expression appropriately:
证明。我们将 (16) 表示为 f(c) 并适当地重写该表达式:
f(c)=i,j=1lcicjfi,fjL2(qX)2i=1lcifi,fqL2(qX)+fq,fqL2(qX).
Taking the derivative with respect to ci yield:
ci 进行求导得到:
f(c)ci=2(j=1lcjfi,fjL2(qX)fi,fqL2(qX)).
Setting these derivatives to zero (for all i{1,,l} ) gives the claimed equation. Noting that the Hessian is equal to 2G (and thus positive-definite) ensures that c is a global minimum of f .
将这些导数设为零(对于所有 i{1,,l})得到所声称的方程。注意到 Hessian 等于 2G(因此是正定的)确保 cf 的全局最小值。
But,of course,neither Gram matrix G nor the vector g¯ is accessible,because there is no access to the target measure qX ,so we switch to the empirical counterparts G~ and g~ .
但当然,Gram 矩阵 G 和向量 g¯ 都无法获取,因为无法访问目标测量 qX,因此我们转向经验对应的 G~g~
Then the following lemma is helpful to gain some information on the error made by the empirical average:
然后,以下引理有助于获得关于经验平均值所产生的误差的一些信息:
Lemma 5. With probability 1δ we have
引理 5. 以概率 1δ 我们有
|TqXfu,TqXfkH1mj=1mfk(xj),fu(xj)Y|C(log121δ)m12,
|Tq,χfk,Tq,χfqH1ni=1nβ(xi)fk(xi),yiY|C(log121δ)(n12+m12),
where C>0 does not depend on n,m and δ .
其中 C>0 不依赖于 n,mδ
Proof. Keeping in mind that fq,fkH we have
证明。记住 fq,fkH 我们有
TqXfu,TqXfkH=Txfk,fuH+(TqXTx)fu,fkH
=1mj=1mKxjKxjfk,fuH+(TqXTx)fu,fkH
=1mj=1mKxjfk,KxjfuY+(TqXTx)fu,fkH
=1mj=1mfk(xj),fu(xj)Y+(TqXTx)fu,fkH.
Moreover,from (12) with probability 1δ we have that
此外,从 (12) 以概率 1δ 我们有
|(TqxTx)fu,fkH|CfuHfkH(log121δ)m12.
Then
然后
|TqXfu,TqXfkH1mj=1mfk(xj),fu(xj)Y|C(log121δ)m12.
Now, we prove the second statement in Lemma 5. We have
现在,我们证明引理 5 中的第二个陈述。我们有
TqXfk,TqXfqH=fk,TqXfqH=fk,TqXfqgX,y,βH+fk,gX,y,βH
=1ni=1nβ(xi)fk,KxiyiH+fk,TqXfqgx,y,βH
=1ni=1nβ(xi)Kxifk,yiY+fk,TqXfqgx,y,βH
=1ni=1nβ(xi)fk(xi),yiY+fk,TqXfqgx,y,βH.
From Lemma 1,with probability 1δ we have
根据引理 1,以概率 1δ 我们有
Tq,χfqgx,y,βH
TqXfqTxfqH+Txfqgx,y,βH
Tq,χfqTxfqH+TxfqTx,βfqH+Tx,βfqgx,y,βH
CfqH(log121δ)m12+CfqH(log121δ)(n12+m12)+Tx,βfqgx,y,βH
CfqH(log121δ)m12+CfqH(log121δ)(n12+m12)+C(log121δ)n12.
Then
然后
fk,Tq,χfqgx,y,βHCfkH(log121δ)(n12+m12).
Therefore,
因此,
|TqXfk,TqXfqH1ni=1nβ(xi)fk(xi),yiY|C(log121δ)(n12+m12).
Towards our main generalization bound Next we use similar arguments as in Theorem 4 from Gizewski et al. (2022) to obtain our main result,Theorem 1. Lemma 5 suggests to approximate G and g¯ by their empirical counterparts:
为了我们的主要泛化界限,接下来我们使用与 Gizewski 等人 (2022) 的定理 4 中类似的论证来获得我们的主要结果,即定理 1。引理 5 建议通过它们的经验对应物来近似 Gg¯
(17)G~=(1mj=1mfk(xj),fu(xj)Y)k,u=1l,
(18)g~=(1ni=1nβ(xi)yi,fk(xi)Y)k=1l
which can be effectively computed from data samples. Moreover, again from Lemma 5 we can argue that with probability 1δ it holds:
这可以有效地从数据样本中计算得出。此外,再次根据引理 5,我们可以论证以概率 1δ 它成立:
(19)g¯g~RlC(log121δ)(n12+m12),
(20)GG~L(Rl)C(log121δ)m12.
With the matrix G~ at hand one can easily check whether or not it is well-conditioned and G~1 exists (otherwise one needs to get rid of models with similar performance). Thus the norms G~L(Rl) and G~1L(Rl) can be bounded independently of m and n ,due to the fact that all their entries can be bounded as follows (we only do the calculation for the entries of G~ ):
有了矩阵 G~,可以很容易地检查它是否条件良好以及 G~1 是否存在(否则需要剔除性能相似的模型)。因此,范数 G~L(Rl)G~1L(Rl) 可以独立于 mn 进行界定,因为它们的所有条目可以如下界定(我们仅对 G~ 的条目进行计算):
|G~k,u|1mj=1m|fk(xj),fu(xj)Y|=1mj=1m|Kxjfk,KxjfuY|
=1mj=1m|KxjKxjfk,fuH|=1mj=1m|Txjfk,fuH|
1mj=1mTxjL(H)fkHfuHκγl2,
where we used the reproducing property (5) to obtain the equality in the first line and (10) for the last inequality. Now assume that m is so large that with probability 1δ we have
在这里,我们使用再生性质 (5) 来获得第一行的等式,并使用 (10) 来获得最后的不等式。现在假设 m 足够大,以至于以概率 1δ 我们有
(21)GG~L(Rl)<1G~1L(Rl).
Moreover we can use the following simple manipulation:
此外,我们可以使用以下简单的操作:
G1=G~1(GG~1)1=G~1(I(IGG~1))1=G~1(I(G~G)G~1)1.
Then (21) ensures that the Neumann series for (I(G~G)G~1)1 converges and we obtain the following bound:
然后 (21) 确保 (I(G~G)G~1)1 的诺依曼级数收敛,我们得到以下界限:
(22)G1L(Rl)G~1L(Rl)1G~1L(Rl)GG~L(Rl)=O(1).
Now we are in the position to prove our main generalization bound (4) for unsupervised domain adaptation:
现在我们可以证明我们的主要泛化界限 (4) 用于无监督领域适应:
Proof of Theorem 1. We have already discussed that the coefficients in the best approximation f to fq are given by c=(c1,c2,,cl)=G1g¯ . Since:
定理 1 的证明。我们已经讨论过最佳近似 ffq 的系数由 c=(c1,c2,,cl)=G1g¯ 给出。由于:
G1(g~g¯)+G1(GG~)c~=G1g~c+c~G1g~=c~c
then from (19)-(22) with probability 1δ we have
然后从 (19)-(22) 中以概率 1δ 我们得到
c~cRl=≤G1L(Rl)(g~g¯Rl+GG~L(Rl)c~Rl)
(23)C(log121δ)(n12+m12).
Moreover:
此外:
Rq(f~)Rq(fq)=TqX(f~fq)H2
(TqX(ffq)H+TqX(f~f)H)2
2TqX(ffq)H2+2TqX(f~f)H2
=2(Rq(f)Rq(fq))+2TqX(f~f)H2
2(Rq(f)Rq(fq))+2(k=1l|ckc~k|TqXfkH)2
2(Rq(f)Rq(fq))+2lcc~Rl2maxkTqXfkH2
(24)2(Rq(f)Rq(fq))+2TqXL(H)2lγl2cc~Rl2,
2(a2+b2)) .
On the dependence of the error bound on the number l of models An interesting question is, how the bound in Eq. (4) depends on l . To this end,let us have a look at the second term in the last line in Eq. (24): TqXL analyzes a sampling operator,thus does not depend on the number of models,same goes for γl ,which is just a uniform bound on all our models. To analyze cc~R22 , let us have a look at the individual factors in the first inequality of Eq. (23). To not overload notation, C>0 is used here for any absolute constant that is independent of l,m,n and δ .
关于误差界限对模型数量 l 的依赖,一个有趣的问题是,方程 (4) 中的界限如何依赖于 l。为此,让我们看看方程 (24) 最后一行中的第二项:TqXL 分析了一个采样算子,因此不依赖于模型的数量,γl 也是如此,它只是对我们所有模型的一个统一界限。为了分析 cc~R22,让我们看看方程 (23) 第一个不等式中的各个因子。为了不使符号过于复杂,这里使用 C>0 表示任何与 l,m,nδ 无关的绝对常数。
  • GG~L(Rl) : The individual entries of GG~ are (in absolute values) bounded by Lemma 5. The proof arguments only involve norm bounds on the associated sampling operators, the uniform bound γl on all the models and the bound B on β ,thus the absolute constant C there is independent of l . By the definition of the matrix Frobenius norm,we thus get
  • GG~L(Rl) : GG~ 的各个条目(绝对值)由引理 5 限制。证明论证仅涉及与相关采样算子的范数界限、对所有模型的统一界限 γl 和对 β 的界限 B,因此那里绝对常数 Cl 无关。根据矩阵 Frobenius 范数的定义,我们得到
(25)GG~L(Rl)Cl(log121δ)(n12+m12)
  • g~g¯Rl : Similar arguments as before lead to g~g¯L(Rl)Cl(log121δ)(n12+m12) - G1L(Rl) : It is natural to assume that there is some constant cG~1L(Rl) . Otherwise, we can, e.g., orthogonalize our models and coefficients without changing the aggregation, but with reducing the conditioning number (i.e.,with reducing G~1 ). It is also natural to assume that m and n are large enough such that l(n12+m12)<12c . Then applying Eq. (25) to Eq. (22),we can deduce that G1L(Rl)2c .
  • g~g¯Rl : 与之前类似的论点导致 g~g¯L(Rl)Cl(log121δ)(n12+m12) - G1L(Rl) : 假设存在某个常数 cG~1L(Rl) 是自然的。否则,我们可以,例如,在不改变聚合的情况下正交化我们的模型和系数,但可以减少条件数(即,减少 G~1 )。同样,假设 mn 足够大,以至于 l(n12+m12)<12c 也是自然的。然后将方程 (25) 应用到方程 (22),我们可以推导出 G1L(Rl)2c
  • c~Rl : This quantity can also be assumed to be known independently of l ,since it is given by our data only.
  • c~Rl : 这个量也可以假设独立于 l 已知,因为它仅由我们的数据给出。
Combining the previous points gives us cc~Rl2Cl2(log1δ)(n1+m1) which finally leads
将之前的几点结合起来,我们得到 cc~Rl2Cl2(log1δ)(n1+m1),这最终导致
Rq(f~)Rq(fq)2(Rq(f)Rq(fq))+Cl3(log1δ)(n1+m1)
for sufficiently large l,m and n and error probability δ>0 .
对于足够大的 l,mn 以及错误概率 δ>0

B CONSTRUCTION OF FUNCTION SPACES
B 函数空间的构造

Let us give a short discussion on the construction of our required function space mentioned in the previous Section A,the reproducing kernel space H . As mentioned already in the main text,the explicit knowledge of H is not required,we just need to rely on its existence. First,any of our models f can be regarded as an element of some reproducing kernel space (RKHS) H~ satisfying the assumptions 1 . This is immediate if f:XR is a real valued continuous function and we take k(x,y)=f(x)f(y) as the associated reproducing kernel. In the case f:XY and Y is finite dimensional, it is not hard to see that a similar construction is possible, as this case can again be boiled down to the construction of a kernel with real-valued output, see e.g. Remark 1 in Caponnetto & De Vito (2007) for details.
让我们简要讨论一下在前一节 A 中提到的所需函数空间的构造,即再生核空间 H 。如主文本中所提到的,不需要明确知道 H,我们只需依赖于其存在性。首先,我们的任何模型 f 都可以被视为某个满足假设 1 的再生核希尔伯特空间 (RKHS) H~ 的元素。如果 f:XR 是一个实值连续函数,并且我们将 k(x,y)=f(x)f(y) 作为相关的再生核,这一点是显而易见的。在 f:XYY 是有限维的情况下,不难看出类似的构造是可能的,因为这种情况可以再次归结为构造一个具有实值输出的核,具体细节请参见 Caponnetto & De Vito (2007) 中的备注 1。
Overall we end up with a finite sequence of spaces (Hk)k=1l+1 of functions living on the same domain X (we have l+1 as we also take into account the regression function),and the existence of a RKHS containing all given models and the regression function is not a real restriction. For example, in case of real valued functions, this assumption is automatically satisfied, as linear combinations of functions with the same domain which stem from a finite sequence of RKHSs belong to an RKHS. This follows from a classical result by N. Aronszajn and R. Godement, see e.g. Pereverzyev (2022, Theorem 1.4.).
总体而言,我们最终得到一个有限的空间序列 (Hk)k=1l+1,这些函数生活在同一领域 X(我们有 l+1,因为我们还考虑了回归函数),并且包含所有给定模型和回归函数的 RKHS 的存在并不是一个真正的限制。例如,在实值函数的情况下,这个假设是自动满足的,因为来自有限序列的 RKHS 的同一领域的函数的线性组合属于一个 RKHS。这是 N. Aronszajn 和 R. Godement 的经典结果,参见例如 Pereverzyev (2022, Theorem 1.4.)。
There is also ongoing research on constructing function spaces (and especially associated reproducing ReLU networks, Fermanian et al. (2021) for recurrent networks and Bietti & Mairal (2017; 2019) for convolutional neural networks. Incorporating these into our work may lead to refined generalization bounds that also reflect the nature of our models. We leave the details open for future work.
目前也在进行关于构建函数空间的研究(特别是与之相关的再生 ReLU 网络,Fermanian 等人 (2021) 针对递归网络,Bietti & Mairal (2017; 2019) 针对卷积神经网络)。将这些纳入我们的工作可能会导致更精细的泛化界限,这些界限也反映了我们模型的性质。我们将细节留待未来的工作。

C DATASETS
C 数据集

This section provides an overview over all applied datasets from language, image, and time-series domains.
本节提供了来自语言、图像和时间序列领域的所有应用数据集的概述。
Illustrative example: For the illustrative example (Figure 1 in the main paper) we rely on the following setting, taken from Shimodaira (2000); Sugiyama et al. (2007); You et al. (2019): The data points are labelled with y=sin(πx)πx with random noise sampled from the normal distribution N(0,(14)2) . Moreover pXN(1,14),qXN(2,(14)2) . The density ratio β can be computed analytically and is bounded. We aggregate several linear models with our approach and compare it to the optimal linear model, whose coefficients have been evaluated using a computer algebra system.
说明性示例:对于说明性示例(主论文中的图 1),我们依赖于以下设置,取自 Shimodaira (2000); Sugiyama 等人 (2007); You 等人 (2019):数据点被标记为 y=sin(πx)πx,并且随机噪声来自正态分布 N(0,(14)2)。此外 pXN(1,14),qXN(2,(14)2)。密度比 β 可以通过解析方式计算,并且是有界的。我们将多个线性模型与我们的方法聚合,并将其与最优线性模型进行比较,后者的系数是使用计算代数系统评估的。
Academic Dataset We rely on the Transformed Moons dataset (Zellinger et al., 2021), allowing us to visualize and address low-dimensional input data. The dataset consists of two-dimensional input data points forming two classes with a "moon-shaped" support. The shift from source to target domain is simulated by a transformation in input space as depicted in Figure 3. The results are shown in the following table:
学术数据集 我们依赖于转化月亮数据集(Zellinger 等,2021),这使我们能够可视化和处理低维输入数据。该数据集由形成两个类别的二维输入数据点组成,具有“月亮形状”的支持。源域到目标域的转变通过输入空间中的变换进行模拟,如图 3 所示。结果显示在以下表格中:
Language Dataset To evaluate our method on a language task, we rely on the Amazon Reviews (Blitzer et al., 2006) dataset. This dataset consists of text reviews from four domains: books (B), DVDs (D), electronics (E), and kitchen appliances (K). Reviews are encoded in 5000 dimensional feature vectors of bag-of-words unigrams and bigrams with binary labels: label 0 if the product is obtain twelve domain adaptation tasks, where each category serves once as source domain and once as target domain (e.g., see Table 15). We follow similar data splits as previous works (Chen et al., 2012; Louizos et al., 2016; Ganin et al., 2016). In particular, we use 4000 labeled source examples and 4000 unlabeled target examples for training, and over 1000 examples for testing
语言数据集 为了在语言任务上评估我们的方法,我们依赖于亚马逊评论(Blitzer 等,2006)数据集。该数据集由来自四个领域的文本评论组成:书籍(B)、DVD(D)、电子产品(E)和厨房电器(K)。评论被编码为5000维特征向量,包含单词袋的单元词和双元词,并带有二元标签:如果产品获得标签0,则进行十二个领域适应任务,其中每个类别作为源域和目标域各使用一次(例如,见表 15)。我们遵循与之前工作(Chen 等,2012;Louizos 等,2016;Ganin 等,2016)相似的数据划分。特别地,我们使用4000个标记的源示例和4000个未标记的目标示例进行训练,并使用超过1000个示例进行测试。
Table 2: Mean and standard deviation (after ± ) of target classification accuracy on Transformed Moons dataset over three different random initialization of model weights and 11 domain adaptation methods.
表 2:在转化月亮数据集上,针对三种不同随机初始化模型权重和11种领域适应方法的目标分类准确率的均值和标准差(在 ± 之后)。
MethodTransformed Moons
HeuristicTheoretical error guarantees
SOTMVTMRTCRSORIWVDEVIWA (ours)TB
HoMM0.994(±0.003)0.994(±0.000)1.000(±0.0031.000(±0.003)0.985(±0.012)0.994(±0.003)0.994(±0.0030.992(±0.000)0.998(±0.000)
AdvSKM0.985(±0.000)1.000(±0.000)0.998(±0.000)0.998(±0.000)0.987(±0.014)0.985(±0.000)0.985(±0.000)1.000(±0.000)1.000(±0.000)
DIRT0.964(±0.009)0.827(±0.018)0.830(±0.017)0.981(±0.003)0.990(±0.0060.964(±0.009)0.869(±0.009)0.983(±0.003)0.966(±0.000)
DDC0.994(±0.0230.998(±0.0000.996(±0.000)0.996(±0.000)0.981(±0.009)0.994(±0.023)0.994(±0.023)0.998(±0.000)0.998(±0.000)
CMD0.990(±0.0060.956(±0.000)0.964(±0.000)0.996(±0.0000.994(±0.006)0.990(±0.006)0.990(±0.006)0.996(±0.0001.000(±0.000)
MMDA0.990(±0.006)1.000l±0.0031.000(±0.003)1.000(±0.003)0.996(±0.0090.990(±0.006)0.990i^±0.0061.000(±0.003)1.000(±0.003)
CoDATS0.979(±0.0161.000(±0.0441.000(±0.052)1.000(±0.016)0.981(±0.012)0.979(±0.016)0.979(±0.174)1.000(±0.010)1.000(±0.045)
Deep-Coral1.000(±0.009)1.000(±0.000)1.000(±0.000)1.000(±0.000)0.985(±0.000)1.000(±0.009)1.000(±0.009)1.000(±0.000)1.000(±0.000)
CDAN1.000(±0.000)1.000(±0.000)1.000(±0.000)1.000(±0.000)0.985/±0.0261.000(±0.000)1.000(±0.000’1.000(±0.0001.000(±0.000)
DANN0.994(±0.006)1.0000˙±0.0003˙1.000/±0.000%1.000/±0.0000.994(±0.016)0.994(±0.006)0.994i^±0.006^1.000(±0.0031.000/±0.003
DSAN0.990(±0.009)1.000(˙±0.0000˙1.000(±0.000)1.000(±0.000)1.000(±0.003)0.990(±0.009)0.990(±0.009)1.000±0.0001.000(±0.000)
Avg.0.989(±0.008)0.980(±0.006)0.981(±0.007)0.997(±0.002)0.989(±0.010)0.989(±0.008)0.981(±0.022)0.997(±0.002)0.997(±0.005)
方法变换的月亮
启发式理论误差保证
SOTMVTMRTCRSORIWVDEVIWA (我们的)TB
HoMM0.994(±0.003)0.994(±0.000)1.000(±0.003)1.000(±0.003)0.985(±0.012)0.994(±0.003)0.994(±0.0030.992(±0.000)0.998(±0.000)
AdvSKM0.985(±0.000)1.000(±0.000)0.998(±0.000)0.998(±0.000)0.987(±0.014)0.985(±0.000)0.985(±0.000)1.000(±0.000)1.000(±0.000)
DIRT0.964(±0.009)0.827(±0.018)0.830(±0.017)0.981(±0.003)0.990(±0.006)0.964(±0.009)0.869(±0.009)0.983(±0.003)0.966(±0.000)
DDC0.994(±0.0230.998(±0.0000.996(±0.000)0.996(±0.000)0.981(±0.009)0.994(±0.023)0.994(±0.023)0.998(±0.000)0.998(±0.000)
CMD0.990(±0.0060.956(±0.000)0.964(±0.000)0.996(±0.0000.994(±0.006)0.990(±0.006)0.990(±0.006)0.996(±0.0001.000(±0.000)
MMDA0.990(±0.006)1.000l±0.0031.000(±0.003)1.000(±0.003)0.996(±0.0090.990(±0.006)0.990i^±0.0061.000(±0.003)1.000(±0.003)
CoDATS0.979(±0.0161.000(±0.0441.000(±0.052)1.000(±0.016)0.981(±0.012)0.979(±0.016)0.979(±0.174)1.000(±0.010)1.000(±0.045)
Deep-Coral1.000(±0.009)1.000(±0.000)1.000(±0.000)1.000(±0.000)0.985(±0.000)1.000(±0.009)1.000(±0.009)1.000(±0.000)1.000(±0.000)
CDAN1.000(±0.000)1.000(±0.000)1.000(±0.000)1.000(±0.000)0.985/±0.0261.000(±0.000)1.000(±0.000’1.000(±0.0001.000(±0.000)
DANN0.994(±0.006)1.0000˙±0.0003˙1.000/±0.000%1.000/±0.0000.994(±0.016)0.994(±0.006)0.994i^±0.006^1.000(±0.003)1.000/±0.003
DSAN0.990(±0.009)1.000(˙±0.0000˙1.000(±0.000)1.000(±0.000)1.000(±0.003)0.990(±0.009)0.990(±0.009)1.000±0.0001.000(±0.000)
平均值0.989(±0.008)0.980(±0.006)0.981(±0.007)0.997(±0.002)0.989(±0.010)0.989(±0.008)0.981(±0.022)0.997(±0.002)0.997(±0.005)
Figure 3: Transformed Moons dataset. Source data is depicted as blue + and orange × . Target data points are shown as black dots.
图 3:转化月亮数据集。源数据以蓝色 + 和橙色 × 表示。目标数据点以黑点显示。
Image Dataset Our third dataset is MiniDomainNet, which is based on the DomainNet-2019 dataset (Peng et al., 2019) consisting of six different image domains (Quickdraw: Q, Real: R, Clipart: C, Sketch: S, Infograph: I, and Painting: P). We follow Zellinger et al. (2021) and rely on the reduced version of DomainNet-2019, referred to as MiniDomainNet, which reduces the number of classes to the top-five largest representatives in the training set across all six domains. To further improve computation time, we rely on a ImageNet (Krizhevsky et al., 2012) pre-trained ResNet-18 (He et al., 2016) backbone. Therefore, we assume that the backbone has learned lower-level filters suitable for the "Real" image category, and we only need to adapt to the remaining five domains (e.g., Clipart, Sketch). This results in five domain adaptation tasks.
图像数据集 我们的第三个数据集是 MiniDomainNet,它基于 DomainNet-2019 数据集(Peng et al., 2019),该数据集由六个不同的图像领域组成(Quickdraw: Q,Real: R,Clipart: C,Sketch: S,Infograph: I 和 Painting: P)。我们遵循 Zellinger et al. (2021) 的方法,依赖于 DomainNet-2019 的简化版本,称为 MiniDomainNet,该版本将类别数量减少到训练集中所有六个领域中前五个最大的代表。为了进一步提高计算时间,我们依赖于一个在 ImageNet(Krizhevsky et al., 2012)上预训练的 ResNet-18(He et al., 2016)主干网络。因此,我们假设主干网络已经学习了适合“真实”图像类别的低级滤波器,我们只需适应其余五个领域(例如,Clipart,Sketch)。这导致了五个领域适应任务。
Time-Series Dataset We based our time-series experiments on the four datasets included in the AdaTime benchmark suite (Ragab et al., 2022), which consists of UCI-HAR, WISDM, HHAR, and Sleep-EDF. The suite includes four representative datasets spanning 20 cross-domain real-world scenarios, i.e., human activity recognition and sleep stage classification. The first dataset is the Human Activity Recognition (HAR) (Anguita et al., 2013) dataset from the UC Irvine Repository denoted as UCI-HAR, which contains data from three motion sensors (accelerometer, gyroscope and body-worn sensors) gathered using smartphones from 30 different subjects. It classifies their activities in several categories, namely, walking, walking upstairs, downstairs, standing, sitting, and lying down. The WISDM (Kwapisz et al., 2011) dataset is a class-imbalanced variant from collected accelerometer sensors, including GPS data, from 29 different subjects which are performing similar activities as in the UCI-HAR dataset. The Heterogeneity Human Activity Recognition (HHAR) (Stisen et al., 2015) dataset investigate sensor-, device- and workload-specific heterogeneities using 36 smartphones and smartwatches, consisting of 13 different device models from four manufacturers. Finally, the Sleep Stage Classification time-series setting aims to classify the electroencephalography
时间序列数据集 我们的时间序列实验基于 AdaTime 基准套件中包含的四个数据集(Ragab et al., 2022),该套件包括 UCI-HAR、WISDM、HHAR 和 Sleep-EDF。该套件包含四个代表性数据集,涵盖 20 个跨领域的真实场景,即人类活动识别和睡眠阶段分类。第一个数据集是来自加州大学欧文分校库的人类活动识别(HAR)数据集(Anguita et al., 2013),标记为 UCI-HAR,包含来自 30 个不同受试者的三种运动传感器(加速度计、陀螺仪和身体佩戴传感器)收集的数据。它将他们的活动分类为几类,即走路、上楼、下楼、站立、坐着和躺下。WISDM(Kwapisz et al., 2011)数据集是一个类别不平衡的变体,来自 29 个不同受试者收集的加速度计传感器,包括 GPS 数据,这些受试者执行的活动与 UCI-HAR 数据集中的活动相似。异质人类活动识别(HHAR)(Stisen et al., 2015)数据集使用 36 部智能手机和智能手表,涵盖来自四个制造商的 13 种不同设备型号,研究传感器、设备和工作负载特定的异质性。最后,睡眠阶段分类时间序列设置旨在对脑电图进行分类。
(EEG) signals into five stages i.e., Wake (W), Non-Rapid Eye Movement stages (N1, N2, N3), and Rapid Eye Movement (REM). Analogous to Ragab et al. (2022); Eldele et al. (2021), we adopt the Sleep-EDF-20 dataset obtained from PhysioBank (Goldberger et al., 2000), which contains EEG readings from 20 healthy subjects. For all datasets, each subject is treated as an own domain, and adopt from a source subject to a target subject.
(EEG) 信号分为五个阶段,即清醒 (W)、非快速眼动阶段 (N1, N2, N3) 和快速眼动 (REM)。类似于 Ragab 等人 (2022);Eldele 等人 (2021),我们采用了从 PhysioBank (Goldberger 等人, 2000) 获取的 Sleep-EDF-20 数据集,该数据集包含来自 20 名健康受试者的 EEG 读数。对于所有数据集,每个受试者被视为一个独立的领域,并从源受试者适应到目标受试者。

D EXPERIMENTAL SETUP
D 实验设置

This section is meant to provide further details on the overall computational setting of our experiments. We start by giving an overview on the used computational resources for the specific datasets and the implementation tools. Next, we describe the network architectures for the individual datasets in greater detail. In the third subsection we elaborate on the construction of our models, and the fourth subsection is devoted to matrix inversion. Finally, in the last subsection, we describe the detailed empirical results and give the complete tables.
本节旨在提供有关我们实验整体计算设置的进一步细节。我们首先概述用于特定数据集的计算资源和实现工具。接下来,我们更详细地描述各个数据集的网络架构。在第三小节中,我们阐述了模型的构建,第四小节专门讨论矩阵求逆。最后,在最后一小节中,我们描述详细的实证结果并给出完整的表格。

D.1 COMPUTATIONAL RESOURCES AND IMPLEMENTATIONS
D.1 计算资源和实现

Overall, to compute the results in our tables, we trained 16680 models with an approximate computational budget of 1500 GPU/hours on one high-performance computing station using 8× NVIDIA P100 16GB, 512GB RAM, 40 Cores Xeon(R) CPU E5-2698 v4 @ 2.20GHz on CentOS Linux 7.
总体而言,为了计算我们表格中的结果,我们在一台高性能计算站上训练了 16680 个模型,计算预算约为 1500 GPU/小时,使用 8× NVIDIA P100 16GB、512GB RAM、40 核 Xeon(R) CPU E5-2698 v4 @ 2.20GHz,操作系统为 CentOS Linux 7。
Transformed Moons: 11 methods ×14 parameters ×1 domain adaptation tasks ×3 seeds +3 density estimator classifier =465 trained models
转换月亮:11 种方法 ×14 参数 ×1 领域适应任务 ×3 种子 +3 密度估计分类器 =465 训练模型
Amazon Reviews: 11 methods ×14 parameters ×12 domain adaptation tasks ×3 seeds +12×3 density estimator classifier =5580 trained models
亚马逊评论:11 种方法 ×14 参数 ×12 领域适应任务 ×3 种子 +12×3 密度估计分类器 =5580 训练模型
MiniDomainNet: 11 methods ×8 parameters ×5 domain adaptation tasks ×3 seeds +5×3 density estimator classifier =1335 trained models
MiniDomainNet: 11 种方法 ×8 参数 ×5 领域适应任务 ×3 种子 +5×3 密度估计分类器 =1335 训练模型
UCI-HAR: 11 methods ×14 parameters ×5 domain adaptation tasks ×3 seeds +5×3 density estimator classifier =2325 trained models
UCI-HAR: 11 种方法 ×14 参数 ×5 领域适应任务 ×3 种子 +5×3 密度估计分类器 =2325 训练模型
Sleep-EDF: 11 methods ×14 parameters ×5 domain adaptation tasks ×3 seeds +5×3 density estimator classifier =2325
Sleep-EDF: 11 种方法 ×14 参数 ×5 领域适应任务 ×3 种子 +5×3 密度估计分类器 =2325
HHAR: 11 methods ×14 parameters ×5 domain adaptation tasks ×3 seeds +5×3 density estimator classifier =2325 trained models
HHAR: 11 种方法 ×14 参数 ×5 领域适应任务 ×3 种子 +5×3 密度估计分类器 =2325 训练模型
WISDM: 11 methods ×14 parameters ×5 domain adaptation tasks ×3 seeds +5×3 density estimator classifier =2325 trained models
WISDM: 11 种方法 ×14 参数 ×5 领域适应任务 ×3 种子 +5×3 密度估计分类器 =2325 训练模型
In Total: 465+5580+1335+4×2325=16680 trained models
总计: 465+5580+1335+4×2325=16680 训练模型
All methods have been implemented in Python using the Pytorch (Paszke et al., 2017, BSD license) library. For monitoring the runs we used Weights & Biases (Biewald, 2020, MIT license). We use Scikit-learn (Pedregosa et al., 2011) library for evaluation measures and toy datasets, and the TQDM (da Costa-Luis, 2019) library, and Tensorboard (Abadi et al., 2015) for keeping track of the progress of our experiments. We built parts of our implementation on the codebase of Zellinger et al. (2021, MIT License) and Ragab et al. (2022, MIT License).
所有方法均已使用 Pytorch (Paszke et al., 2017, BSD 许可证) 库在 Python 中实现。为了监控运行情况,我们使用了 Weights & Biases (Biewald, 2020, MIT 许可证)。我们使用 Scikit-learn (Pedregosa et al., 2011) 库进行评估指标和玩具数据集,并使用 TQDM (da Costa-Luis, 2019) 库和 Tensorboard (Abadi et al., 2015) 来跟踪实验进展。我们的实现部分基于 Zellinger et al. (2021, MIT 许可证) 和 Ragab et al. (2022, MIT 许可证) 的代码库构建。

D.2 ARCHITECTURES AND TRAINING SETUP
D.2 架构与训练设置

In this subsection, we provide details on the model architectures and the training setup for every dataset. Our base architectures are based on the AdaTime benchmark suite, which is a large-scale evaluation of domain adaptation algorithms on time-series data. We extended the benchmark suite to support 11 state-of-the-art model architectures on multiple dataset types ranging from language, image to time-series data, addressed by Transformed Moons, Amazon Reviews, MiniDomainNet and the four time-series datasets (UCI-HAR, WISDM, HHAR, and Sleep-EDF) spanning in total 38 cross-domain real-world scenarios.
在本小节中,我们提供每个数据集的模型架构和训练设置的详细信息。我们的基础架构基于 AdaTime 基准套件,这是对时间序列数据上的领域适应算法的大规模评估。我们扩展了基准套件,以支持 11 种最先进的模型架构,涵盖从语言、图像到时间序列数据的多种数据集类型,涉及 Transformed Moons、Amazon Reviews、MiniDomainNet 和四个时间序列数据集(UCI-HAR、WISDM、HHAR 和 Sleep-EDF),总共涵盖 38 个跨领域的真实场景。
Transformed Moons For the Transformed Moons dataset we use two sequential blocks with fully-connected layers, 1D-BatchNorm, ReLU activation functions and Dropout. The full architecture specification can be found in Table 3. The domain classifier (density ratio estimator) uses the same architecture. We train the class prediction models for 50 epochs and the domain classifier for 80 epochs with learning rate 0.001 , weight decay 0.0001 and batchsize 128 using the Adam optimizer (Kingma & Ba, 2014). We share the same base architecture and training setup across every domain adaption method (e.g., DANN, HoMM, CMD). Additional hyper-parameters are reported in Table 9.
对于 Transformed Moons 数据集,我们使用两个包含全连接层的顺序块,1D-BatchNorm、ReLU 激活函数和 Dropout。完整的架构规范见表 3。领域分类器(密度比估计器)使用相同的架构。我们对类别预测模型训练 50 个周期,对领域分类器训练 80 个周期,学习率为 0.001,权重衰减为 0.0001,批量大小为 128,使用 Adam 优化器(Kingma & Ba, 2014)。我们在每种领域适应方法(例如 DANN、HoMM、CMD)中共享相同的基础架构和训练设置。其他超参数在表 9 中报告。
Table 3: Model architecture for the Transformed Moons dataset. The values for neural network layers correspond to the number of output units.
表 3:Transformed Moons 数据集的模型架构。神经网络层的值对应于输出单元的数量。
Architecture
LayersValues
MLP Block 1Input units2
Fully-connected Layer Batch Normalization 1D Layer ReLU Activation Dropout128 128
MLP Block 2Fully-connected Layer Batch Normalization 1D Layer ReLU Activation Dropout128 128
Fully-connected Layer128
Methods See Table 7 and Table 8 for details.
架构
MLP 块 1输入单元2
全连接层 批量归一化 1D 层 ReLU 激活 Dropout128 128
MLP 块 2全连接层 批量归一化 1D 层 ReLU 激活 Dropout128 128
全连接层128
方法 详见表 7 和表 8。
Amazon Reviews For the Amazon Reviews dataset we use two sequential blocks with fully-connected layers, 1D-BatchNorm, ReLU activation function and Dropout, analogous to the setup for Transformed Moons. We also use the same architecture for the domain classifier. We train the class prediction models for 50 epochs and the domain classifier for 80 epochs with learning rate 0.001 , weight decay 0.0001 and batchsize 128 using the Adam optimizer (Kingma & Ba, 2014). We share the same base architecture and training setup across every domain adaption method (e.g., DANN, HoMM, CMD). Additional hyper-parameters are reported in Table 9.
对于亚马逊评论数据集,我们使用两个顺序块,包含全连接层、1D-BatchNorm、ReLU 激活函数和 Dropout,类似于变换月亮的设置。我们还为领域分类器使用相同的架构。我们训练类别预测模型 50 个周期,领域分类器 80 个周期,学习率为 0.001,权重衰减为 0.0001,批量大小为 128,使用 Adam 优化器(Kingma & Ba, 2014)。我们在每个领域适应方法(例如 DANN、HoMM、CMD)中共享相同的基础架构和训练设置。附加超参数在表 9 中报告。
Table 4: Model architecture for the Amazon Reviews dataset. The values for neural network layers correspond to the number of output units.
表 4:亚马逊评论数据集的模型架构。神经网络层的值对应于输出单元的数量。
Architecture
LayersValues
MLP Block 1Input units5000
Fully-connected Layer Batch Normalization 1D Layer ReLU Activation Dropout128 128
MLP Block 2Fully-connected Layer Batch Normalization 1D Layer ReLU Activation Dropout128 128
Fully-connected Layer128
Methods See Table 7 and Table 8 for details.
架构
MLP 块 1输入单元5000
全连接层 批量归一化 1D 层 ReLU 激活 Dropout128 128
MLP 块 2全连接层 批量归一化 1D 层 ReLU 激活 Dropout128 128
全连接层128
方法详见表7和表8。
MiniDomainNet Following the pre-trained setup from Peng et al. (2019), we use a frozen ResNet- 18 backbone model which was trained on ImageNet, and operate subsequent computations on the
MiniDomainNet 根据 Peng 等人(2019)的预训练设置,我们使用一个冻结的 ResNet-18 主干模型,该模型是在 ImageNet 上训练的,并在提取的
512 dimensional extracted features. To alleviate overfitting effects on pre-computed features, we perform data augmentation on the images of each batch and forward each batch through the backbone. We incorporate zero padding before resizing the images to 256×256 to avoid image distortions. Furthermore, in alignment with data augmentation techniques from Shorten & Khoshgoftaar (2019), we perform random resized cropping to 224×224 with a random viewport between 70% and 100% of the original image,random horizontal flipping,color jittering of 0.25% on each RGB channel,and a ±2 degree rotation.
512 维特征上进行后续计算。为了减轻对预计算特征的过拟合影响,我们对每个批次的图像进行数据增强,并将每个批次通过主干网络进行前向传播。我们在调整图像大小到 256×256 之前加入零填充,以避免图像失真。此外,结合 Shorten & Khoshgoftaar(2019)的数据增强技术,我们对原始图像进行随机调整裁剪到 224×224,随机视口在 70%100% 之间,随机水平翻转,对每个 RGB 通道进行 0.25% 的颜色抖动,以及 ±2 度的旋转。
After the ResNet-18 backbone output, we add a projection layer, and define the domain adaptation layers on which we use the domain adaptation methods to align the representations. The backbone and projection layers are defined as a common architecture across the different domain adaptation methods. Additional layers are further added for the classification networks, according to the requirements of the individual domain adaptation methods (e.g., CMD, HoMM). The number of layers/neurons in the upper layers of our architecture have been tuned in order to achieve the best performance in the source-only setup. See Table 5 for a detailed description of the architecture used. We perform listed methods, and with 3 repetitions based on different random weights initialization. All class prediction models have been trained for 60 epochs and domain classifiers for 100 epochs with Adam optimizer,a learning rate of 0.001,β1=0.9,β2=0.999 ,batchsize of 128 and weight decay of 0.0001 Additional hyper-parameters are reported in Table 9.
在 ResNet-18 主干输出后,我们添加了一个投影层,并定义了领域适应层,在这些层上我们使用领域适应方法来对齐表示。主干和投影层被定义为不同领域适应方法之间的通用架构。根据各个领域适应方法的要求(例如,CMD,HoMM),进一步添加了额外的层用于分类网络。我们架构上层的层数/神经元数量经过调优,以便在仅源数据的设置中实现最佳性能。有关所用架构的详细描述,请参见表 5。我们执行了列出的各种方法,并基于不同的随机权重初始化进行了 3 次重复。所有类别预测模型已训练 60 个周期,领域分类器训练 100 个周期,使用 Adam 优化器,学习率为 0.001,β1=0.9,β2=0.999,批量大小为 128,权重衰减为 0.0001。其他超参数在表 9 中报告。
Table 5: Model architecture for the MiniDomainNet dataset. The values for neural network layers correspond to the number of output units.
表 5:MiniDomainNet 数据集的模型架构。神经网络层的值对应于输出单元的数量。
Architecture
LayersValues
Backbone Output LayerResNet-18 (Adaptive Average Pooling Layer)512
Fully-connected Layer128
Methods See Table 7 and Table 8 for details.
架构
主干输出层ResNet-18(自适应平均池化层)512
全连接层128
方法详见表7和表8。
AdaTime Unless stated otherwise, we follow the implementation and hyper-parameter settings as reported in Ragab et al. (2022). We extended the AdaTime suite to comprise a collection of 11 domain adaptation algorithms. We learned all domain adaptations models according to the following approaches (see also Table 6, Table 7 and Table 8): Deep Domain Confusion (DDC) (Tzeng et al., 2014), Correlation Alignment via Deep Neural Networks (Deep-Coral) (Sun et al., 2017), Higher-order Moment Matching (HoMM) (Chen et al., 2020), Minimum Discrepancy Estimation for Deep Domain Adaptation (MMDA) (Rahman et al., 2020), Central Moment Discrepancy (CMD) (Zellinger et al., 2017), Deep Subdomain Adaptation (DSAN) (Zhu et al., 2021), Domain-Adversarial Neural Networks (DANN) (Ganin et al., 2016), Conditional Adversarial Domain Adaptation (CDAN) (Long et al., 2018), A DIRT-T Approach to Unsupervised Domain Adaptation (DIRT) (Shu et al., 2018), Convolutional deep Domain Adaptation model for Time-Series data (CoDATS) (Wilson et al., 2020), and Adversarial Spectral Kernel Matching (AdvSKM) (Liu & Xue, 2021). The backbone architecture of all models is a 1D-CNN network. It consists of three CNN blocks and each block has a 1D pooling and Dropout. In the first block, the kernel size of the convolutional layer is set according to the dataset as reported in Ragab et al. (2022). After the convolutional blocks, we apply an 1D adaptive pooling layer. All methods are trained for 100 epochs on all datasets. The batch size is 32, except for Sleep-EDF, where we use batch size of 128. All models are trained with Adam optimizer (Kingma & Ba,2014) and weight decay of 104 . Additional hyper-parameters are reported in Table 10.
AdaTime 除非另有说明,否则我们遵循 Ragab 等人(2022)中报告的实现和超参数设置。我们扩展了 AdaTime 套件,以包含 11 种领域适应算法的集合。我们根据以下方法学习了所有领域适应模型(另见表 6、表 7 和表 8):深度领域混淆(DDC)(Tzeng 等人,2014),通过深度神经网络的相关性对齐(Deep-Coral)(Sun 等人,2017),高阶矩匹配(HoMM)(Chen 等人,2020),深度领域适应的最小差异估计(MMDA)(Rahman 等人,2020),中心矩差异(CMD)(Zellinger 等人,2017),深度子领域适应(DSAN)(Zhu 等人,2021),领域对抗神经网络(DANN)(Ganin 等人,2016),条件对抗领域适应(CDAN)(Long 等人,2018),无监督领域适应的 DIRT-T 方法(DIRT)(Shu 等人,2018),时间序列数据的卷积深度领域适应模型(CoDATS)(Wilson 等人,2020),以及对抗谱核匹配(AdvSKM)(Liu & Xue,2021)。所有模型的主干架构是一个 1D-CNN 网络。它由三个 CNN 块组成,每个块都有一个 1D 池化和 Dropout。在第一个块中,卷积层的核大小根据 Ragab 等人(2022)中报告的数据集进行设置。在卷积块之后,我们应用一个 1D 自适应池化层。所有方法在所有数据集上训练 100 个周期。批量大小为 32,除了 Sleep-EDF,我们使用批量大小为 128。所有模型均使用 Adam 优化器(Kingma & Ba,2014)和权重衰减 104 进行训练。其他超参数在表 10 中报告。

D.3 MODEL SEQUENCE
D.3 模型序列

Our algorithm, IWA, constructs an ensemble from a sequence of different classifiers, e.g. obtained from a sequence of possible hyper-parameter configurations in domain adaptation algorithms. To all datasets with different hyper-parameter choices. For the experiments on the language, im-
我们的算法 IWA 从一系列不同的分类器构建一个集成,例如从领域适应算法中可能的超参数配置序列中获得。对于所有具有不同超参数选择的数据集。关于语言的实验,im-
Table 6: Model backbone for the AdaTime suite. Kernel size, stride, output channels of the convolutional layers are dataset dependent and are chosen according to Ragab et al. (2022).
表 6:AdaTime 套件的模型骨干。卷积层的核大小、步幅和输出通道依赖于数据集,并根据 Ragab 等人(2022)的研究进行选择。
Architecture
Layers
Conv Block 1Convolutional 1D Layer Batch Normalization 1D Layer ReLU Activation Max Pooling 1D Layer Dropout
Conv Block 2Convolutional 1D Layer Batch Normalization 1D Layer ReLU Activation Max Pooling 1D Layer Dropout
Conv Block 3Convolutional 1D Layer Batch Normalization 1D Layer ReLU Activation Max Pooling 1D Layer Dropout
Adaptive Pooling 1D Layer
Methods See Table 7 and Table 8 for details.
架构
卷积块 1一维卷积层 批量归一化一维层 ReLU 激活 最大池化一维层 Dropout
卷积块 2一维卷积层 批量归一化一维层 ReLU 激活 最大池化一维层 Dropout
卷积块 3一维卷积层 批量归一化一维层 ReLU 激活 最大池化一维层 Dropout
自适应池化一维层
方法 详见表 7 和表 8。
Table 7: Model architecture for the AdaTime dataset. Layer hyper-parameters are dataset dependent and are chosen according to Ragab et al. (2022).
表 7:AdaTime 数据集的模型架构。层超参数依赖于数据集,并根据 Ragab 等人(2022)的研究进行选择。
Method Architectures (Part 1)
Class Output Head Domain Classifier HeadDANN
Fully-connected Layer Fully-connected Layer ReLU Activation Fully-connected Layer ReLU Activation Fully-connected Layer
Class Output HeadDeepCoral
Fully-connected Layer
Class Output HeadDDC
Fully-connected Layer
Class Output HeadHoMM
Fully-connected Layer
Class Output HeadCoDATS
Fully-connected Layer ReLU Activation Fully-connected Layer ReLU Activation Fully-connected Layer
Class Output HeadDSAN
Fully-connected Layer
方法架构(第一部分)
类输出头 领域分类器头DANN
全连接层 全连接层 ReLU 激活 全连接层 ReLU 激活 全连接层
类输出头DeepCoral
全连接层
类输出头DDC
全连接层
类输出头HoMM
全连接层
类输出头CoDATS
全连接层 ReLU 激活 全连接层 ReLU 激活 全连接层
类输出头DSAN
全连接层
age and academic dataset, the values of the hyper-parameters are shown in Table 9. For the good sequence of values. For all settings except MiniDomainnet, we multiply each parameter by
年龄和学术数据集的超参数值如表 9 所示。对于良好的值序列。除了 MiniDomainnet 的所有设置,我们将每个参数乘以
Table 8: Model architecture for the AdaTime dataset. Hyper-parameters are dataset dependent and are chosen according to Ragab et al. (2022).
表 8:AdaTime 数据集的模型架构。超参数依赖于数据集,并根据 Ragab 等人(2022)的研究进行选择。
Method Architectures (Part 2)
AdvSKM
Class Output Head AdvSKM Embedder 1 AdvSKM Embedder 2Fully-connected Layer Fully-connected Layer Fully-connected Layer Batch Normalization 1D Layer Cosine Activation Fully-connected Layer Fully-connected Layer Batch Normalization 1D Layer Cosine Activation Fully-connected Layer Fully-connected Layer Batch Normalization 1D Layer ReLU Activation Fully-connected Layer Fully-connected Layer Batch Normalization 1D Layer ReLU Activation
Class Output HeadMMDA
Fully-connected Layer
CMD
Class Output HeadFully-connected Layer
CDAN
Class Output Head Domain Classifier HeadFully-connected Layer Fully-connected Layer ReLU Activation Fully-connected Layer ReLU Activation Fully-connected Layer
Class Output Head Domain Classifier HeadDIRT
Fully-connected Layer Fully-connected Layer ReLU Activation Fully-connected Layer ReLU Activation Fully-connected Layer
方法架构(第二部分)
AdvSKM
类输出头 AdvSKM 嵌入器 1 AdvSKM 嵌入器 2全连接层 全连接层 全连接层 批量归一化 1D 层 余弦激活 全连接层 全连接层 批量归一化 1D 层 余弦激活 全连接层 全连接层 批量归一化 1D 层 ReLU 激活 全连接层 全连接层 批量归一化 1D 层 ReLU 激活
类输出头MMDA
全连接层
CMD
类输出头全连接层
CDAN
类输出头领域分类器头全连接层 全连接层 ReLU 激活 全连接层 ReLU 激活 全连接层
类输出头领域分类器头DIRT
全连接层 全连接层 ReLU 激活 全连接层 ReLU 激活 全连接层
λ{0,0.0001,0.001,0.01,0.05,0.1,0.25,0.5,0.75,1,1.5,2,5,10} . In this way,we generate a sequence of 14 hyper-parameter choices. Due to computational limitations, in MiniDomainNet we use λ{0,0.0001,0.001,0.01,0.1,1,5,10} . All values are listed in Table 9 and Table 10.
λ{0,0.0001,0.001,0.01,0.05,0.1,0.25,0.5,0.75,1,1.5,2,5,10}。通过这种方式,我们生成了一系列 14 个超参数选择。由于计算限制,在 MiniDomainNet 中我们使用 λ{0,0.0001,0.001,0.01,0.1,1,5,10}。所有值列在表 9 和表 10 中。
Table 9: Domain adaptation hyper-parameter sequences for experiments on the datasets Transformed Moons, AmazonReviews, and MiniDomainNet. We multiply each hyper-paramter with a set of scaling factors λ{0,0.0001,0.001,0.01,0.05,0.1,0.25,0.5,0.75,1,1.5,2,5,10} to {0,0.0001,0.001,0.01,0.1,1,5,10} .
表 9:在数据集 Transformed Moons、AmazonReviews 和 MiniDomainNet 上进行实验的领域适应超参数序列。我们将每个超参数与一组缩放因子 λ{0,0.0001,0.001,0.01,0.05,0.1,0.25,0.5,0.75,1,1.5,2,5,10}{0,0.0001,0.001,0.01,0.1,1,5,10} 相乘。
Datasets
MethodHyper-parameterTransformed MoonsAmazon ReviewsMiniDomainNet
DANNClassification loss weight Domain loss weight0.9603 λ×0.92380.9603 λ×0.92380.9603 λ×0.9238
DeepCoralClassification loss weight Coral loss weight0.05931 λ×8.4520.05931 λ×8.4520.05931 λ×8.452
DDCClassification loss weight MMD loss weight0.1593 λ×0.20480.1593 λ×0.20480.1593 λ×0.2048
CMDClassification loss weight0.960.960.96
CMD loss weightλ×5.52λ×5.52λ×5.52
HoMMClassification loss weight0.24290.24290.2429
Higher-order-MMD loss weightλ×0.9824λ×0.9824λ×0.9824
CoDATSClassification loss weight0.54160.54160.5416
Adversarial loss weightλ×0.5582λ×0.5582λ×0.5582
DSANClassification loss weight0.41330.41330.4133
Local MMD loss weightλ×0.16λ×0.16λ×0.16
AdvSKMClassification loss weight0.46370.46370.4637
Adversarial MMD loss weightλ×0.1511λ×0.1511λ×0.1511
MMDAClassification loss weight0.95050.95050.9505
MMD loss weightλ×0.5476λ×0.5476λ×0.5476
Conditional loss weightλ×0.5167λ×0.5167λ×0.5167
Coral loss weightλ×0.5838λ×0.5838λ×0.5838
CDANClassification loss weight0.66360.66360.6636
Adversarial loss weightλ×0.1954λ×0.1954λ×0.1954
Conditional loss weightλ×0.0124λ×0.0124λ×0.0124
DIRTClassification loss weight0.97520.97520.9752
Adversarial loss weightλ×0.3892λ×0.3892λ×0.3892
Conditional loss weightλ×0.09228λ×0.09228λ×0.09228
Virtual adversarial loss weightλ×0.1947λ×0.1947λ×0.1947
数据集
方法超参数转换的月球亚马逊评论MiniDomainNet
DANN分类损失权重 域损失权重0.9603 λ×0.92380.9603 λ×0.92380.9603 λ×0.9238
DeepCoral分类损失权重 珊瑚损失权重0.05931 λ×8.4520.05931 λ×8.4520.05931 λ×8.452
DDC分类损失权重 MMD损失权重0.1593 λ×0.20480.1593 λ×0.20480.1593 λ×0.2048
CMD分类损失权重0.960.960.96
CMD 损失权重λ×5.52λ×5.52λ×5.52
HoMM分类损失权重0.24290.24290.2429
高阶 MMD 损失权重λ×0.9824λ×0.9824λ×0.9824
CoDATS分类损失权重0.54160.54160.5416
对抗损失权重λ×0.5582λ×0.5582λ×0.5582
DSAN分类损失权重0.41330.41330.4133
局部 MMD 损失权重λ×0.16λ×0.16λ×0.16
AdvSKM分类损失权重0.46370.46370.4637
对抗 MMD 损失权重λ×0.1511λ×0.1511λ×0.1511
MMDA分类损失权重0.95050.95050.9505
MMD 损失权重λ×0.5476λ×0.5476λ×0.5476
条件损失权重λ×0.5167λ×0.5167λ×0.5167
珊瑚损失权重λ×0.5838λ×0.5838λ×0.5838
CDAN分类损失权重0.66360.66360.6636
对抗损失权重λ×0.1954λ×0.1954λ×0.1954
条件损失权重λ×0.0124λ×0.0124λ×0.0124
DIRT分类损失权重0.97520.97520.9752
对抗损失权重λ×0.3892λ×0.3892λ×0.3892
条件损失权重λ×0.09228λ×0.09228λ×0.09228
虚拟对抗损失权重λ×0.1947λ×0.1947λ×0.1947
Table 10: Domain adaptation hyper-parameters for experiments on the time-series data. We multiply each hyper-paramter with a set of scaling factors λ {0,0.0001,0.001,0.01,0.05,0.1,0.25,0.5,0.75,1,1.5,2,5,10} to obtain a sequence.
表 10:在时间序列数据上进行实验的领域适应超参数。我们将每个超参数与一组缩放因子 λ {0,0.0001,0.001,0.01,0.05,0.1,0.25,0.5,0.75,1,1.5,2,5,10} 相乘以获得一个序列。
Datasets
MethodHyper-parameterUCI-HARSleep-EDFWISDMHHAR
DANNClassification loss weight Domain loss weight9.74 λ×5.438.3 λ×0.3245.613 λ×1.8570.9603 λ×0.9238
DeepCoralClassification loss weight Coral loss weight8.67 λ×0.449.39 λ×0.198.876 λ×5.560.05931 λ×8.452
DDCClassification loss weight MMD loss weight6.24 λ×6.362.951 λ×8.9237.01 λ×7.5950.1593 λ×0.2048
CMDClassification loss weight CMD loss weight0.96 λ×5.520.96 λ×5.520.96 λ×5.520.96 λ×5.52
HoMMClassification loss weight2.150.1970.19130.2429
Higher-order-MMD loss weightλ×9.13λ×1.102λ×4.239λ×0.9824
CoDATSClassification loss weight6.219.2397.1870.5416
Adversarial loss weightλ×1.72λ×1.342λ×6.439λ×0.5582
DSANClassification loss weight1.766.7130.10.4133
Local MMD loss weightλ×1.59λ×6.708λ×0.1λ×0.16
AdvSKMClassification loss weight3.052.53.050.4637
Adversarial MMD loss weightλ×2.876λ×2.5λ×2.876λ×0.1511
MMDAClassification loss weight6.134.480.10.9505
MMD loss weightλ×2.37λ×5.951λ×0.1λ×0.5476
Conditional loss weightλ×7.16λ×6.13λ×0.4753λ×0.5167
Coral loss weightλ×8.63λ×3.36λ×0.1λ×0.5838
CDANClassification loss weight5.196.8039.540.6636
Adversarial loss weightλ×2.91λ×4.726λ×3.283λ×0.1954
Conditional loss weightλ×1.73λ×1.307λ×0.1λ×0.0124
DIRTClassification loss weight7.09.1830.10.9752
Adversarial loss weightλ×4.51λ×7.411λ×0.1λ×0.3892
Conditional loss weightλ×0.79λ×2.564λ×0.1λ×0.09228
Virtual adversarial loss weightλ×9.31λ×3.583λ×0.1λ×0.1947
数据集
方法超参数UCI-HARSleep-EDFWISDMHHAR
DANN分类损失权重 领域损失权重9.74 λ×5.438.3 λ×0.3245.613 λ×1.8570.9603 λ×0.9238
DeepCoral分类损失权重 Coral 损失权重8.67 λ×0.449.39 λ×0.198.876 λ×5.560.05931 λ×8.452
DDC分类损失权重 MMD 损失权重6.24 λ×6.362.951 λ×8.9237.01 λ×7.5950.1593 λ×0.2048
CMD分类损失权重 CMD 损失权重0.96 λ×5.520.96 λ×5.520.96 λ×5.520.96 λ×5.52
HoMM分类损失权重2.150.1970.19130.2429
高阶 MMD 损失权重λ×9.13λ×1.102λ×4.239λ×0.9824
CoDATS分类损失权重6.219.2397.1870.5416
对抗损失权重λ×1.72λ×1.342λ×6.439λ×0.5582
DSAN分类损失权重1.766.7130.10.4133
局部 MMD 损失权重λ×1.59λ×6.708λ×0.1λ×0.16
AdvSKM分类损失权重3.052.53.050.4637
对抗 MMD 损失权重λ×2.876λ×2.5λ×2.876λ×0.1511
MMDA分类损失权重6.134.480.10.9505
MMD 损失权重λ×2.37λ×5.951λ×0.1λ×0.5476
条件损失权重λ×7.16λ×6.13λ×0.4753λ×0.5167
Coral 损失权重λ×8.63λ×3.36λ×0.1λ×0.5838
CDAN分类损失权重5.196.8039.540.6636
对抗损失权重λ×2.91λ×4.726λ×3.283λ×0.1954
条件损失权重λ×1.73λ×1.307λ×0.1λ×0.0124
DIRT分类损失权重7.09.1830.10.9752
对抗损失权重λ×4.51λ×7.411λ×0.1λ×0.3892
条件损失权重λ×0.79λ×2.564λ×0.1λ×0.09228
虚拟对抗损失权重λ×9.31λ×3.583λ×0.1λ×0.1947

D.4 MATRIX INVERSION
D.4 矩阵求逆

Matrix inversion is a well-known numerical task, especially in cases of limited computing precision and ill-conditioned matrices. In our case, similar models in the given sequence can cause numerical instability due to limited compute precision. That is, occasionally a tabula rasa inversion of the matrix G~ in Algorithm 1 is numerically unstable. Various standard approaches can be applied to handle this common issue, including the exclusion of similar models and various regularization techniques. In our computational setup, we rely on the Python routine numpy.linalg.pinv, which is based on the eigendecompostion of G~ (coinciding with the singular value decomposition in our case due to positive-definiteness) and an eigenvalue-based regularization based on a treshold value rcond for small eigenvalues, see Strang (1980, pages 138-140) for details. The choice of rcond depends on the scale of the Gram matrix and can therefore be chosen by source data only. Based on evaluating our method on source data only (target domain is fixed to be source domain) on several choices for rcond, we obtain a stable choice for rcond of 101 for all datasets.
矩阵求逆是一个众所周知的数值任务,尤其是在计算精度有限和病态矩阵的情况下。在我们的案例中,给定序列中的相似模型可能会由于计算精度有限而导致数值不稳定。也就是说,在算法 1 中,矩阵 G~ 的偶尔“白板”求逆在数值上是不稳定的。可以应用多种标准方法来处理这一常见问题,包括排除相似模型和各种正则化技术。在我们的计算设置中,我们依赖于基于 G~ 的特征分解的 Python 程序 numpy.linalg.pinv(在我们的案例中由于正定性与奇异值分解相一致)以及基于小特征值的阈值 rcond 的特征值正则化,详细信息请参见 Strang (1980, 第 138-140 页)。rcond 的选择取决于 Gram 矩阵的规模,因此可以仅通过源数据进行选择。基于对源数据(目标领域固定为源领域)上 rcond 的多个选择评估我们的方法,我们为所有数据集获得了 rcond 的稳定选择 101

D.5 Correlation Analysis: Target Accuracy of Individual Models vs. AGGREGATION WEIGHT
D.5 相关性分析:单个模型的目标准确度与聚合权重

Figure 2 in the main paper suggests that there is a positive correlation between the target accuracy of each individual model (orange dashed line, Figure 2 top) and the respective aggregation weight (red bars, Figure 2 bottom), if the linear aggregation of the models is computed by our method IWA. In the following, we analyze whether this trend holds throughout all other experiments.
主论文中的图 2 表明,如果通过我们的方法 IWA 计算模型的线性聚合,则每个单独模型的目标准确度(橙色虚线,图 2 顶部)与相应的聚合权重(红色条,图 2 底部)之间存在正相关。接下来,我们分析这一趋势是否在所有其他实验中都成立。
More precisely,for a given sequence f1,,fl of models,we compute the Pearson correlation coefficient between the aggregation weights c~1,,c~l (see Algorithm 1) and the corresponding target accuracies 1ti=1t1[yi=f1(xi)],,1ti=1t1[yi=fl(xi)] for the target test data (x1,y1),,(xt,yt) ,where 1[P]=1 iff P is true and 1[P]=0 otherwise. In this context,a positive correlation coefficient means that the aggregation algorithm assigns a higher weight to models performing better on target samples. We calculate these coefficients for our method IWA and the other linear regression baselines SOR, TCR, and TMR for all domain adaptation methods across all datasets and cross-domain scenarios. Note that for this analysis we cannot compare to the other baseline TMV, as the count-based aggregation by majority voting does not involve the computation of aggregation weights.
更准确地说,对于给定的模型序列 f1,,fl,我们计算聚合权重 c~1,,c~l(见算法 1)与目标测试数据 (x1,y1),,(xt,yt) 的相应目标准确率 1ti=1t1[yi=f1(xi)],,1ti=1t1[yi=fl(xi)] 之间的 Pearson 相关系数,其中 1[P]=1 当且仅当 P 为真时, 1[P]=0 否则。在此背景下,正相关系数意味着聚合算法为在目标样本上表现更好的模型分配了更高的权重。我们为我们的方法 IWA 和其他线性回归基线 SOR、TCR 和 TMR 计算这些系数,涵盖所有领域适应方法、所有数据集和跨域场景。请注意,在此分析中,我们无法与其他基线 TMV 进行比较,因为基于多数投票的计数聚合不涉及聚合权重的计算。
In Figure 4 and 6 we compare the resulting correlation coefficient distribution of our method IWA to the ones for the heuristic baselines SOR, TCR, and TMR. Figure 5 compares the correlation coefficients for IWA on different datasets.
在图 4 和 6 中,我们将我们的方法 IWA 的相关系数分布与启发式基线 SOR、TCR 和 TMR 的相关系数分布进行了比较。图 5 比较了 IWA 在不同数据集上的相关系数。
We find that IWA shows a stronger positive correlation than other methods, between a model's target accuracy and its aggregation weight.
我们发现 IWA 显示出比其他方法更强的正相关性,即模型的目标准确率与其聚合权重之间的关系。

D.6 SENSITIVITY ANALYSIS: EFFECT OF ADDING INACCURATE MODELS
D.6 敏感性分析:添加不准确模型的影响

We study the sensitivity of our method with respect to adding inaccurate models in the given sequence of models. We define an innacurate model as having a target accuracy lower than 80% of the target accuracy of the model computed without domain adaptation (SO). In particular, we add +10, +50 and +100 inaccurate models to the given sequences of models. One inaccurate model is constructed as follows: First, a model is chosen uniformly at random from the given sequence of models. Second, the outputs of the chosen model are corrupted by adding, to half of the elements of its vector-valued output, random Gaussian noise with zero-mean and unit variance.
我们研究了我们的方法在添加不准确模型时的敏感性,考虑给定的模型序列。我们将不准确模型定义为其目标准确率低于 80% 的模型,该模型是在没有领域适应(SO)的情况下计算的目标准确率。具体而言,我们向给定的模型序列中添加 +10、+50 和 +100 个不准确模型。一个不准确模型的构造过程如下:首先,从给定的模型序列中随机均匀选择一个模型。其次,通过向其向量值输出的一半元素添加均值为零、方差为一的随机高斯噪声来破坏所选模型的输出。
In Figure 7, we show the median performance over all domain adaptation methods for each dataset. We see, that the performance of our method (IWA) is not very sensitive w.r.t. an increase in the number of inaccurate models. This is in contrast to the heuristics (e.g. TMV, TMR or TCR) and model selection method DEV, which show a high sensitivity concerning adding inaccurate models. SOR also shows stable results; it is, however, clearly outperformed by our method in five out of six stable choice concerning inaccurate models in the given sequence.
在图7中,我们展示了所有领域适应方法在每个数据集上的中位性能。我们看到,我们的方法(IWA)对不准确模型数量增加的敏感性并不高。这与启发式方法(例如TMV、TMR或TCR)和模型选择方法DEV形成对比,后者对添加不准确模型表现出高度敏感性。SOR也显示出稳定的结果;然而,在给定序列中,对于不准确模型的六个稳定选择中的五个,明显被我们的方法超越。
Figure 6: Histogram of the correlation coefficients of IWA and the linear regression heuristic baselines SOR, TCR and TMR over all datasets (showing the same data as Figure 4). In comparison to the heuristic baselines, IWA shows a stronger positive correlation between a model's target accuracy and its aggregation weight.
图6:IWA与线性回归启发式基线SOR、TCR和TMR在所有数据集上的相关系数直方图(显示与图4相同的数据)。与启发式基线相比,IWA在模型的目标准确性与其聚合权重之间显示出更强的正相关性。

D.7 Detailed EMPIRICAL RESULTS
D.7 详细的实证结果

In this section, we add all result tables for the datasets described in the main paper. Table 15, Table 16 and Table 17 show all domain adaptation tasks for the Amazon Review dataset. Table 13 shows all domain adaptation tasks for the MiniDomainNet experiments. Table 21, Table 22, Table 23, Table 24, Table 25, Table 26, Table 27, and Table 28 show all domain adaptation task results for the time-series
在本节中,我们添加了主论文中描述的数据集的所有结果表。表15、表16和表17展示了亚马逊评论数据集的所有领域适应任务。表13展示了MiniDomainNet实验的所有领域适应任务。表21、表22、表23、表24、表25、表26、表27和表28展示了时间序列的所有领域适应任务结果。
Baselines As addressed in the main paper, our method, IWA, is compared to ensemble learning methods that use linear regression and majority voting as heuristic for model aggregation, and, model selection methods with theoretical error guarantees. The heuristic baselines are majority voting on target data (TMV), source-only regression (SOR), target majority voting regression (TMR), target confidence average regression (TCR). The model selection methods with theoretical error guarantees are importance weighted validation (IWV) (Sugiyama et al., 2007) and deep embedded validation (DEV) (Kouw et al., 2019). The tables also provide a column for source-only (SO) performance and target-best (TB) performance. We highlight in bold the performance of the best performing method with theoretical error guarantees, and in italic the best performing heuristic.
基线 如主论文中所述,我们的方法 IWA 与使用线性回归和多数投票作为模型聚合启发式的集成学习方法进行了比较,以及具有理论误差保证的模型选择方法。启发式基线包括目标数据上的多数投票 (TMV)、仅源回归 (SOR)、目标多数投票回归 (TMR) 和目标置信度平均回归 (TCR)。具有理论误差保证的模型选择方法包括重要性加权验证 (IWV) (Sugiyama et al., 2007) 和深度嵌入验证 (DEV) (Kouw et al., 2019)。表格还提供了仅源 (SO) 性能和目标最佳 (TB) 性能的列。我们用粗体突出显示具有理论误差保证的最佳表现方法的性能,用斜体表示最佳表现的启发式方法。
Figure 7: Sensitivity of methods for parameter choice issues w.r.t. adding inaccurate models in the given sequence of models; separate for each dataset (a-f), averaged over all domain adaptation methods, source-target pairs, and random seeds. Horizontal axes: Number of inaccurate models added to the initial sequence of models. Vertical axes: Target accuracy. Solid lines indicate median and shaded area indicate 50% confidence intervals.
图 7:方法对参数选择问题的敏感性,涉及在给定模型序列中添加不准确模型;针对每个数据集 (a-f) 分开计算,平均所有领域适应方法、源-目标对和随机种子。横轴:添加到初始模型序列中的不准确模型数量。纵轴:目标准确性。实线表示中位数,阴影区域表示 50% 置信区间。

D.7.1 DETAILED SUMMARY RESULTS
D.7.1 详细总结结果

Table 11: Average target accuracies (and average standard deviations) for all 7 datasets (e.g., Sleep-EDF, MiniDomainNet, Amazon Reviews) taken over several domain adaptation tasks (e.g., 5 on Sleep-EDF, 5 on MiniDomainNet, 12 on Amazon Reviews), 11 domain adaptation methods (e.g., DANN, HoMM, CMD) and 3 repetitions with different random initialization of model weights. The input sequences of the approaches (e.g., DEV, IWA) consist of neural networks computed by runs of the domain adaptation methods with different hyper-parameters (e.g.,8 different values of λ for
表 11:在多个领域适应任务(例如,在 Sleep-EDF 上进行 5 次,在 MiniDomainNet 上进行 5 次,在 Amazon Reviews 上进行 12 次)、11 种领域适应方法(例如,DANN、HoMM、CMD)以及 3 次不同随机初始化模型权重的重复实验中,所有 7 个数据集(例如,Sleep-EDF、MiniDomainNet、Amazon Reviews)的平均目标准确率(及其平均标准差)。
DANN).
DatasetHeuristicTheoretical error guarantees
soTMVTMRTCRSORTWVDEVIWA (ours)TB
Transformed Moons0.989(±0.008)0.980(±0.006)0.981(±0.007)0.997(±0.002)0.989(±0.010)0.989(±0.008)0.981(±0.022)0.997(±0.002)0.997(±0.005)
Amazon Reviews0.767(±0.011)0.787(±0.009)0.786(±0.010)0.786(±0.010)0.789(±0.0100.772(±0.014)0.764(±0.019)0.788(±0.0090.781(±0.012)
MiniDomainNet0.507(±0.022)0.526(±0.0110.525(±0.014)0.526(±0.013)0.518(±0.012)0.513(±0.022)0.515(±0.028)0.531(±0.0110.534(±0.022)
Sleep-EDF0.655(±0.054)0.729(±0.018)0.729(±0.024)0.725(±0.023)0.717(±0.028)0.700(±0.052)0.660(±0.057)0.737(±0.020)0.712(±0.045)
UCI-HAR0.770(±0.046)0.840(±0.017)0.833(±0.023)0.832(±0.024)0.769(±0.060)0.774(±0.070)0.765(±0.090)0.835(±0.020)0.850˙˙˙˙˙0.029˙
HHAR0.732(±0.042)0.771(±0.015)0.768(±0.017)0.7711+0.0180.722(+0.068)0.746(±0.037)0.722(±0.063)0.787(±0.012)0.784(±0.028)
WISDM0.736i^+0.050j^0.768 (±0.0270.768(±0.036)0.765(±0.037)0.737(±0.062)0.736(±0.052)0.726(±0.077)0.764(±0.025)0.771(±0.046)
数据集启发式理论误差保证
所以TMVTMRTCRSORTWVDEVIWA (我们的)TB
转变的月球0.989(±0.008)0.980(±0.006)0.981(±0.007)0.997(±0.002)0.989(±0.010)0.989(±0.008)0.981(±0.022)0.997(±0.002)0.997(±0.005)
亚马逊评论0.767(±0.011)0.787(±0.009)0.786(±0.010)0.786(±0.010)0.789(±0.0100.772(±0.014)0.764(±0.019)0.788(±0.0090.781(±0.012)
MiniDomainNet0.507(±0.022)0.526(±0.0110.525(±0.014)0.526(±0.013)0.518(±0.012)0.513(±0.022)0.515(±0.028)0.531(±0.0110.534(±0.022)
Sleep-EDF0.655(±0.054)0.729(±0.018)0.729(±0.024)0.725(±0.023)0.717(±0.028)0.700(±0.052)0.660(±0.057)0.737(±0.020)0.712(±0.045)
UCI-HAR0.770(±0.046)0.840(±0.017)0.833(±0.023)0.832(±0.024)0.769(±0.060)0.774(±0.070)0.765(±0.090)0.835(±0.020)0.850˙˙˙˙˙0.029˙
HHAR0.732(±0.042)0.771(±0.015)0.768(±0.017)0.7711+0.0180.722(+0.068)0.746(±0.037)0.722(±0.063)0.787(±0.012)0.784(±0.028)
WISDM0.736i^+0.050j^0.768 (±0.0270.768(±0.036)0.765(±0.037)0.737(±0.062)0.736(±0.052)0.726(±0.077)0.764(±0.025)0.771(±0.046)
Table 12: Mean and standard deviation (after ± ) of target classification accuracy on Amazon Reviews dataset over three different random initialization of model weights and 12 domain adaptation tasks.
表 12:在 Amazon Reviews 数据集上,经过三次不同随机初始化模型权重和 12 个领域适应任务后的目标分类准确率的均值和标准差(在 ± 之后)。
MethodAmazon Reviews
HeuristicTheoretical error guarantees
SOTMVTMRTCRSORIWVDEVIWA (ours)TB
HoMM0.769(±0.009)0.777(±0.010)0.778(±0.010)0.778(±0.011)0.777(±0.010)0.765(±0.011)0.766(±0.011)0.778(±0.010)0.769(±0.012)
AdvSKM0.766(±0.012)0.780(±0.009)0.779(±0.010)0.779(±0.008)0.778(±0.011)0.769(±0.012)0.766(±0.012)0.780(±0.009)0.770(±0.012)
DIRT0.764(±0.009)0.786(±0.008)0.786(±0.010)0.786(±0.008)0.800(±0.0080.778(±0.022)0.773(±0.056)0.787(±0.008)0.786(±0.009)
DDC0.766(±0.012)0.779(±0.0100.780(±0.0090.779(±0.010)0.778(±0.010)0.767(±0.017)0.768(±0.011)0.780(±0.010)0.770(±0.013)
CMD0.767(±0.012)0.791(±0.009)0.792(±0.009)0.789(±0.010)0.792(±0.0100.765(±0.015)0.710(±0.015)0.794(±0.0090.785(±0.009)
MMDA0.767(±0.011)0.787(±0.011)0.785(±0.010)0.785(±0.010)0.787(±0.012)0.769(±0.011)0.766(±0.010)0.787(±0.011)0.782(±0.011)
CoDATS0.766(±0.013)0.795(±0.009)0.793(±0.010)0.794(±0.012)0.799(±0.010)0.779(±0.016)0.773(±0.020)0.796(±0.0090.791(±0.015)
Deep-Coral0.766(±0.012)0.784(±0.009)0.783(±0.009)0.783(±0.009)0.782(±0.009)0.769(±0.016)0.769(±0.037)0.785(±0.009)0.776(±0.013)
CDAN0.767(±0.012)0.788(±0.0100.787(±0.009)0.787(±0.010)0.787(±0.011)0.775(±0.011)0.776(±0.014)0.788(±0.010)0.777(±0.011)
DANN0.767(±0.012)0.796(±0.010)0.792(±0.010)0.793(±0.010)0.800(±0.011)0.776(±0.011)0.778(±0.012)0.797(±0.009)0.798(±0.012)
DSAN0.769(±0.009)0.796(±0.009)0.792(±0.009)0.791(±0.010)0.800(±0.010)0.779(±0.012)0.763(±0.017)0.795(±0.009)0.789(±0.012)
Avg.0.767(±0.011)0.787(±0.009)0.786(±0.010)0.786(±0.010)0.789(±0.010)0.772(±0.014)0.764(±0.019)0.788(±0.009)0.781(±0.012)
方法亚马逊评论
启发式理论误差保证
SOTMVTMRTCRSORIWVDEVIWA(我们的)TB
HoMM0.769(±0.009)0.777(±0.010)0.778(±0.010)0.778(±0.011)0.777(±0.010)0.765(±0.011)0.766(±0.011)0.778(±0.010)0.769(±0.012)
AdvSKM0.766(±0.012)0.780(±0.009)0.779(±0.010)0.779(±0.008)0.778(±0.011)0.769(±0.012)0.766(±0.012)0.780(±0.009)0.770(±0.012)
DIRT0.764(±0.009)0.786(±0.008)0.786(±0.010)0.786(±0.008)0.800(±0.0080.778(±0.022)0.773(±0.056)0.787(±0.008)0.786(±0.009)
DDC0.766(±0.012)0.779(±0.010)0.780(±0.0090.779(±0.010)0.778(±0.010)0.767(±0.017)0.768(±0.011)0.780(±0.010)0.770(±0.013)
CMD0.767(±0.012)0.791(±0.009)0.792(±0.009)0.789(±0.010)0.792(±0.0100.765(±0.015)0.710(±0.015)0.794(±0.0090.785(±0.009)
MMDA0.767(±0.011)0.787(±0.011)0.785(±0.010)0.785(±0.010)0.787(±0.012)0.769(±0.011)0.766(±0.010)0.787(±0.011)0.782(±0.011)
CoDATS0.766(±0.013)0.795(±0.009)0.793(±0.010)0.794(±0.012)0.799(±0.010)0.779(±0.016)0.773(±0.020)0.796(±0.0090.791(±0.015)
Deep-Coral0.766(±0.012)0.784(±0.009)0.783(±0.009)0.783(±0.009)0.782(±0.009)0.769(±0.016)0.769(±0.037)0.785(±0.009)0.776(±0.013)
CDAN0.767(±0.012)0.788(±0.0100.787(±0.009)0.787(±0.010)0.787(±0.011)0.775(±0.011)0.776(±0.014)0.788(±0.010)0.777(±0.011)
DANN0.767(±0.012)0.796(±0.010)0.792(±0.010)0.793(±0.010)0.800(±0.011)0.776(±0.011)0.778(±0.012)0.797(±0.009)0.798(±0.012)
DSAN0.769(±0.009)0.796(±0.009)0.792(±0.009)0.791(±0.010)0.800(±0.010)0.779(±0.012)0.763(±0.017)0.795(±0.009)0.789(±0.012)
平均0.767(±0.011)0.787(±0.009)0.786(±0.010)0.786(±0.010)0.789(±0.010)0.772(±0.014)0.764(±0.019)0.788(±0.009)0.781(±0.012)
Table 13: Mean and standard deviation (after ± ) of target classification accuracy on MiniDomainNet dataset over three different random initialization of model weights and five domain adaptation tasks.
表 13:在 MiniDomainNet 数据集上,经过三次不同随机初始化模型权重和五个领域适应任务后的目标分类准确率的均值和标准差(在 ± 之后)。
MethodMiniDomainNet
HeuristicTheoretical error guarantees
soTMVTMRTCRSORIWVDEVIWA (ours)TB
HoMM0.509(±0.018)0.526(±0.009)0.524(±0.010)0.523(±0.010)0.518(±0.015)0.511(±0.018)0.511(±0.018)0.531(±0.007)0.537(±0.020)
AdvSKM0.509(±0.015)0.516(±0.0110.514(±0.014)0.514(±0.017)0.514(±0.009)0.515(±0.032)0.512(±0.032)0.516(±0.011)0.522(±0.017)
DIRT0.499(±0.022)0.517(±0.006)0.515(±0.009)0.508(±0.006)0.507(±0.0180.493(±0.028)0.498(±0.033)0.519(±0.0060.525(±0.020)
DDC0.510(±0.032)0.514(±0.0120.512(±0.018)0.511(±0.0190.514(±0.015)0.511(±0.028)0.512(±0.0430.516(±0.0150.521(±0.019)
CMD0.509(±0.019)0.528(±0.011)0.531(±0.0160.531(±0.0160.522(±0.0130.518(±0.025)0.490(±0.037)0.533(±0.0090.533(±0.024)
MMDA0.509(±0.021)0.524(±0.0040.522(±0.005)0.523(±0.007)0.519(±0.012)0.517(±0.008)0.524(±0.006)0.527(±0.0090.531(±0.029)
CoDATS0.502(±0.026)0.535(±0.026)0.533(±0.027)0.536(±0.0280.525(±0.015)0.518(±0.031)0.524(±0.0510.536(±0.025)0.529(±0.042)
Deep-Coral0.505(±0.022)0.538(±0.009)0.534(±0.013)0.533(±0.012)0.520(±0.0100.518(±0.022)0.524(±0.0340.539(±0.0120.535(±0.024)
CDAN0.514(±0.028)0.526(±0.0130.524(±0.011)0.525(±0.008)0.517(±0.005)0.513(±0.011)0.516(±0.014)0.532(±0.0100.542(±0.017)
DANN0.496(±0.023)0.530(±0.011)0.535(±0.011)0.538(±0.005)0.516(±0.011)0.519(±0.015)0.515(±0.013)0.541(±0.006)0.532(±0.014)
DSAN0.509(±0.022)0.537(±0.011)0.534(±0.013)0.543(±0.011)0.525(±0.013)0.513(±0.018)0.535(±0.023)0.546(±0.007)0.563(±0.013)
Avg.0.507(±0.022)0.526(±0.011)0.525(±0.014)0.526(±0.013)0.518(±0.012)0.513(±0.022)0.515(±0.028)0.531(±0.011)0.534(±0.022)
方法MiniDomainNet
启发式理论误差保证
所以TMVTMRTCRSORIWVDEVIWA(我们的)TB
HoMM0.509(±0.018)0.526(±0.009)0.524(±0.010)0.523(±0.010)0.518(±0.015)0.511(±0.018)0.511(±0.018)0.531(±0.007)0.537(±0.020)
AdvSKM0.509(±0.015)0.516(±0.0110.514(±0.014)0.514(±0.017)0.514(±0.009)0.515(±0.032)0.512(±0.032)0.516(±0.011)0.522(±0.017)
DIRT0.499(±0.022)0.517(±0.006)0.515(±0.009)0.508(±0.006)0.507(±0.0180.493(±0.028)0.498(±0.033)0.519(±0.0060.525(±0.020)
DDC0.510(±0.032)0.514(±0.012)0.512(±0.018)0.511(±0.0190.514(±0.015)0.511(±0.028)0.512(±0.0430.516(±0.0150.521(±0.019)
CMD0.509(±0.019)0.528(±0.011)0.531(±0.0160.531(±0.0160.522(±0.0130.518(±0.025)0.490(±0.037)0.533(±0.0090.533(±0.024)
MMDA0.509(±0.021)0.524(±0.004)0.522(±0.005)0.523(±0.007)0.519(±0.012)0.517(±0.008)0.524(±0.006)0.527(±0.0090.531(±0.029)
CoDATS0.502(±0.026)0.535(±0.026)0.533(±0.027)0.536(±0.0280.525(±0.015)0.518(±0.031)0.524(±0.051)0.536(±0.025)0.529(±0.042)
深度珊瑚0.505(±0.022)0.538(±0.009)0.534(±0.013)0.533(±0.012)0.520(±0.0100.518(±0.022)0.524(±0.0340.539(±0.0120.535(±0.024)
CDAN0.514(±0.028)0.526(±0.0130.524(±0.011)0.525(±0.008)0.517(±0.005)0.513(±0.011)0.516(±0.014)0.532(±0.0100.542(±0.017)
DANN0.496(±0.023)0.530(±0.011)0.535(±0.011)0.538(±0.005)0.516(±0.011)0.519(±0.015)0.515(±0.013)0.541(±0.006)0.532(±0.014)
DSAN0.509(±0.022)0.537(±0.011)0.534(±0.013)0.543(±0.011)0.525(±0.013)0.513(±0.018)0.535(±0.023)0.546(±0.007)0.563(±0.013)
平均值0.507(±0.022)0.526(±0.011)0.525(±0.014)0.526(±0.013)0.518(±0.012)0.513(±0.022)0.515(±0.028)0.531(±0.011)0.534(±0.022)
Table 14: Mean and standard deviation (after ± ) of target classification accuracy on four time series datasets over three different random initialization of model weights and five domain adaptation tasks.
表 14:在四个时间序列数据集上,经过三次不同随机初始化模型权重和五个领域适应任务后的目标分类准确率的均值和标准差(在 ± 之后)。
MethodSleep-EDF
HeuristicTheoretical error guarantees
SOTMVTMRTCRSORIWVDEVIWA (ours)TB
HoMM0.676(±0.036)0.722(±0.017)0.719(±0.023)0.718(±0.021)0.724(±0.032)0.726(±0.046)0.678(±0.035)0.747(±0.0250.715(±0.047)
AdvSKM0.665(±0.058)0.708(±0.023)0.712(±0.027)0.712(±0.032)0.718(±0.030)0.703(±0.069)0.692(±0.038)0.722(±0.025)0.706(±0.054)
DIRT0.656(±0.058)0.743(±0.009)0.745(±0.012)0.748(±0.019)0.742(±0.031)0.679(±0.0380.686(±0.066)0.749(±0.010)0.728(±0.037)
DDC0.646(±0.035)0.717(±0.029)0.721(±0.0370.712(±0.031)0.695(±0.020)0.694(±0.056)0.666(±0.031)0.724(±0.012)0.704(±0.031
CMD0.653(±0.057)0.740(±0.0220.736(±0.016)0.723(±0.020)0.709(±0.015)0.716(±0.052)0.640(±0.068)0.729(±0.0180.725(±0.053)
MMDA0.650(±0.051)0.736(±0.0140.727(±0.021)0.723(±0.018)0.714(±0.0280.704(±0.033)0.660(±0.034)0.745(±0.0310.715(±0.042)
CoDATS0.672(±0.084)0.738(±0.029)0.739(±0.0360.736(±0.030)0.723(±0.039)0.683(±0.090)0.690(±0.107)0.744(±0.0210.715(±0.045
Deep-Coral0.643(±0.049)0.716(±0.018)0.717(±0.0280.712(±0.027)0.694i^±0.032ı^0.700(±0.053)0.675(±0.077)0.713(±0.021)0.702(±0.070)
CDAN0.652l±0.056l0.732(±0.016)0.739(±0.024)0.739i^±0.018^0.728˙˙˙±0.029˙0.697(±0.031)0.642(±0.065)0.748(±0.019)0.713(±0.045)
DANN0.641(±0.047)0.722(±0.017)0.723(±0.026)0.721(±0.025)0.714(±0.024)0.687(±0.034)0.644(±0.046)0.724(±0.0180.710(±0.035)
DSAN0.653(±0.060)0.748(±0.008)0.740(±0.016)0.732(±0.016)0.728(±0.026)0.712(±0.070)0.589(±0.063)0.757(±0.016)0.700(±0.033)
Avg.0.655(±0.054)0.729(±0.018)0.729(±0.024)0.725(±0.023)0.717(±0.028)0.700(±0.052)0.660(±0.057)0.737(±0.020)0.712(±0.045)
方法睡眠-EDF
启发式理论误差保证
SOTMVTMRTCRSORIWVDEVIWA(我们的)TB
HoMM0.676(±0.036)0.722(±0.017)0.719(±0.023)0.718(±0.021)0.724(±0.032)0.726(±0.046)0.678(±0.035)0.747(±0.0250.715(±0.047)
AdvSKM0.665(±0.058)0.708(±0.023)0.712(±0.027)0.712(±0.032)0.718(±0.030)0.703(±0.069)0.692(±0.038)0.722(±0.025)0.706(±0.054)
DIRT0.656(±0.058)0.743(±0.009)0.745(±0.012)0.748(±0.019)0.742(±0.031)0.679(±0.0380.686(±0.066)0.749(±0.010)0.728(±0.037)
DDC0.646(±0.035)0.717(±0.029)0.721(±0.0370.712(±0.031)0.695(±0.020)0.694(±0.056)0.666(±0.031)0.724(±0.012)0.704(±0.031
CMD0.653(±0.057)0.740(±0.0220.736(±0.016)0.723(±0.020)0.709(±0.015)0.716(±0.052)0.640(±0.068)0.729(±0.0180.725(±0.053)
MMDA0.650(±0.051)0.736(±0.0140.727(±0.021)0.723(±0.018)0.714(±0.0280.704(±0.033)0.660(±0.034)0.745(±0.0310.715(±0.042)
CoDATS0.672(±0.084)0.738(±0.029)0.739(±0.0360.736(±0.030)0.723(±0.039)0.683(±0.090)0.690(±0.107)0.744(±0.0210.715(±0.045
Deep-Coral0.643(±0.049)0.716(±0.018)0.717(±0.0280.712(±0.027)0.694i^±0.032ı^0.700(±0.053)0.675(±0.077)0.713(±0.021)0.702(±0.070)
CDAN0.652l±0.056l0.732(±0.016)0.739(±0.024)0.739i^±0.018^0.728˙˙˙±0.029˙0.697(±0.031)0.642(±0.065)0.748(±0.019)0.713(±0.045)
DANN0.641(±0.047)0.722(±0.017)0.723(±0.026)0.721(±0.025)0.714(±0.024)0.687(±0.034)0.644(±0.046)0.724(±0.0180.710(±0.035)
DSAN0.653(±0.060)0.748(±0.008)0.740(±0.016)0.732(±0.016)0.728(±0.026)0.712(±0.070)0.589(±0.063)0.757(±0.016)0.700(±0.033)
平均0.655(±0.054)0.729(±0.018)0.729(±0.024)0.725(±0.023)0.717(±0.028)0.700(±0.052)0.660(±0.057)0.737(±0.020)0.712(±0.045)
MethodUCI-HAR
HeuristicTheoretical error guarantees
SOTMVTMRTCRSORIWVDEVIWA (ours)TB
HoMM0.782(±0.078)0.833(±0.020)0.818(±0.023)0.818(±0.022)0.783(±0.040)0.809(±0.095)0.800(±0.098)0.826(±0.010)0.854(±0.039)
AdvSKM0.724(±0.059)0.791(±0.024)0.800(±0.022)0.810(±0.022)0.768(±0.042)0.707(±0.100)0.711(±0.167)0.800(±0.022)0.811(±0.039)
DIRT0.783(±0.044)0.912(±0.0130.907(±0.009)0.890(±0.016)0.756(±0.0360.807(±0.107)0.808(±0.1120.900(±0.015)0.928(±0.034)
DDC0.790(±0.061)0.806(±0.019)0.807(±0.026)0.810(±0.0170.756(±0.1080.724(±0.066)0.734(±0.109)0.804(±0.0280.792(±0.013)
CMD0.788(±0.058)0.869(±0.012)0.849(±0.014)0.839(±0.023)0.731(±0.0660.804(±0.064)0.812(±0.080)0.842(±0.025)0.888(±0.037)
MMDA0.785(±0.018)0.819(±0.022)0.812(±0.028)0.800(±0.032)0.759(±0.0850.773(±0.073)0.767(±0.107)0.807(±0.0250.840(±0.055)
CoDATS0.760(±0.037)0.854(±0.022)0.832(±0.027)0.832(±0.006)0.785(±0.0570.801(±0.079)0.794(±0.078)0.846(±0.016)0.867(±0.012)
Deep-Coral0.790(±0.035)0.810(±0.0070.800(±0.022)0.808(±0.0300.771(±0.023)0.768(±0.044)0.773(±0.087)0.808(±0.016)0.806(±0.022)
CDAN0.756(±0.055)0.842(±0.009)0.843(±0.020)0.840(±0.034)0.802(±0.080)0.781(±0.072)0.687(±0.068’0.846(±0.018)0.853(±0.026)
DANN0.756(±0.026)0.858(±0.016)0.856(±0.033)0.856(±0.033)0.800(±0.057)0.763(±0.032)0.780(±0.043)0.849(±0.023)0.847(±0.007)
DSAN0.762(±0.032)0.849(±0.023)0.843(±0.033)0.854(±0.0250.749(±0.065)0.775(±0.043)0.744(±0.035)0.858(±0.023)0.865(±0.038)
Avg.0.770(±0.046)0.840(±0.017)0.833(±0.023)0.832(±0.024)0.769(±0.060)0.774(±0.070)0.765(±0.090)0.835(±0.020)0.850(±0.029)
方法UCI-HAR
启发式理论误差保证
SOTMVTMRTCRSORIWVDEVIWA(我们的)TB
HoMM0.782(±0.078)0.833(±0.020)0.818(±0.023)0.818(±0.022)0.783(±0.040)0.809(±0.095)0.800(±0.098)0.826(±0.010)0.854(±0.039)
AdvSKM0.724(±0.059)0.791(±0.024)0.800(±0.022)0.810(±0.022)0.768(±0.042)0.707(±0.100)0.711(±0.167)0.800(±0.022)0.811(±0.039)
DIRT0.783(±0.044)0.912(±0.0130.907(±0.009)0.890(±0.016)0.756(±0.0360.807(±0.107)0.808(±0.1120.900(±0.015)0.928(±0.034)
DDC0.790(±0.061)0.806(±0.019)0.807(±0.026)0.810(±0.0170.756(±0.1080.724(±0.066)0.734(±0.109)0.804(±0.0280.792(±0.013)
CMD0.788(±0.058)0.869(±0.012)0.849(±0.014)0.839(±0.023)0.731(±0.0660.804(±0.064)0.812(±0.080)0.842(±0.025)0.888(±0.037)
MMDA0.785(±0.018)0.819(±0.022)0.812(±0.028)0.800(±0.032)0.759(±0.0850.773(±0.073)0.767(±0.107)0.807(±0.0250.840(±0.055)
CoDATS0.760(±0.037)0.854(±0.022)0.832(±0.027)0.832(±0.006)0.785(±0.0570.801(±0.079)0.794(±0.078)0.846(±0.016)0.867(±0.012)
Deep-Coral0.790(±0.035)0.810(±0.0070.800(±0.022)0.808(±0.0300.771(±0.023)0.768(±0.044)0.773(±0.087)0.808(±0.016)0.806(±0.022)
CDAN0.756(±0.055)0.842(±0.009)0.843(±0.020)0.840(±0.034)0.802(±0.080)0.781(±0.072)0.687(±0.068’0.846(±0.018)0.853(±0.026)
DANN0.756(±0.026)0.858(±0.016)0.856(±0.033)0.856(±0.033)0.800(±0.057)0.763(±0.032)0.780(±0.043)0.849(±0.023)0.847(±0.007)
DSAN0.762(±0.032)0.849(±0.023)0.843(±0.033)0.854(±0.0250.749(±0.065)0.775(±0.043)0.744(±0.035)0.858(±0.023)0.865(±0.038)
平均0.770(±0.046)0.840(±0.017)0.833(±0.023)0.832(±0.024)0.769(±0.060)0.774(±0.070)0.765(±0.090)0.835(±0.020)0.850(±0.029)
MethodHHAR
HeuristicTheoretical error guarantees
soTMVTMRTCRSORTWVDEVIWA (ours)TB
HoMM0.739(±0.044)0.757(±0.014)0.759(±0.013)0.759(±0.011)0.700(±0.058)0.720(±0.027)0.733(±0.031)0.759(±0.007)0.764(±0.023)
AdvSKM0.718(±0.042)0.749(±0.027)0.742(±0.032)0.748(±0.034)0.676(±0.046)0.730(±0.051)0.728(±0.051)0.752(±0.0310.749(±0.025)
DIRT0.728(±0.026)0.803(±0.011)0.792(±0.016)0.803(±0.017)0.796(±0.066)0.743(±0.028)0.739(±0.0750.816(±0.008)0.820(±0.015)
DDC0.716(±0.063)0.748(±0.014)0.750(±0.0090.748(±0.007)0.717(±0.075)0.711(±0.048)0.705(±0.0660.748(±0.012)0.729(±0.027)
CMD0.748(±0.027)0.760(±0.014)0.764(±0.006)0.767(±0.007)0.737(±0.100)0.775(±0.0310.643(±0.031)0.766(±0.016)0.794(±0.023)
MMDA0.738(±0.036)0.783(±0.017)0.781(±0.016)0.780(±0.015)0.698(±0.038)0.719(±0.036)0.731(±0.047)0.780(±0.017)0.785(±0.035)
CoDATS0.710(±0.030)0.766(±0.023)0.772(±0.040)0.773(±0.0500.722(±0.064)0.739(±0.028)0.739(±0.040)0.812(±0.009)0.785/±0.039
Deep-Coral0.745(±0.046)0.766(±0.0120.762i^±0.015j^0.766(±0.027)0.681(±0.073)0.754(±0.054)0.758(±0.244)0.764(±0.006)0.776(±0.023)
CDAN0.728(±0.039)0.762l±0.012)0.758(±0.017)0.764(±0.016)0.765(±0.0630.774(±0.035)0.775(±0.036)0.816(±0.011)0.790(±0.038)
DANN0.757(±0.057)0.779(±0.012)0.774(±0.009)0.773(±0.011)0.722(±0.103)0.798(±0.041)0.793(±0.045)0.818(±0.009)0.807(±0.020)
DSAN0.721(±0.053)0.803(±0.010)0.797(±0.014)0.802(±0.007)0.724(±0.065)0.741(±0.033)0.596(±0.031)0.825(±0.008)0.826(±0.046)
Avg.0.732(±0.042)0.771(±0.015)0.768(±0.017)0.771(±0.018)0.722(±0.068)0.746(±0.037)0.722(±0.063)0.787(±0.012)0.784(±0.028)
方法HHAR
启发式理论误差保证
所以TMVTMRTCRSORTWVDEVIWA(我们的)TB
HoMM0.739(±0.044)0.757(±0.014)0.759(±0.013)0.759(±0.011)0.700(±0.058)0.720(±0.027)0.733(±0.031)0.759(±0.007)0.764(±0.023)
AdvSKM0.718(±0.042)0.749(±0.027)0.742(±0.032)0.748(±0.034)0.676(±0.046)0.730(±0.051)0.728(±0.051)0.752(±0.0310.749(±0.025)
DIRT0.728(±0.026)0.803(±0.011)0.792(±0.016)0.803(±0.017)0.796(±0.066)0.743(±0.028)0.739(±0.0750.816(±0.008)0.820(±0.015)
DDC0.716(±0.063)0.748(±0.014)0.750(±0.0090.748(±0.007)0.717(±0.075)0.711(±0.048)0.705(±0.0660.748(±0.012)0.729(±0.027)
CMD0.748(±0.027)0.760(±0.014)0.764(±0.006)0.767(±0.007)0.737(±0.100)0.775(±0.0310.643(±0.031)0.766(±0.016)0.794(±0.023)
MMDA0.738(±0.036)0.783(±0.017)0.781(±0.016)0.780(±0.015)0.698(±0.038)0.719(±0.036)0.731(±0.047)0.780(±0.017)0.785(±0.035)
CoDATS0.710(±0.030)0.766(±0.023)0.772(±0.040)0.773(±0.0500.722(±0.064)0.739(±0.028)0.739(±0.040)0.812(±0.009)0.785/±0.039
深度珊瑚0.745(±0.046)0.766(±0.012)0.762i^±0.015j^0.766(±0.027)0.681(±0.073)0.754(±0.054)0.758(±0.244)0.764(±0.006)0.776(±0.023)
CDAN0.728(±0.039)0.762l±0.012)0.758(±0.017)0.764(±0.016)0.765(±0.063)0.774(±0.035)0.775(±0.036)0.816(±0.011)0.790(±0.038)
DANN0.757(±0.057)0.779(±0.012)0.774(±0.009)0.773(±0.011)0.722(±0.103)0.798(±0.041)0.793(±0.045)0.818(±0.009)0.807(±0.020)
DSAN0.721(±0.053)0.803(±0.010)0.797(±0.014)0.802(±0.007)0.724(±0.065)0.741(±0.033)0.596(±0.031)0.825(±0.008)0.826(±0.046)
平均0.732(±0.042)0.771(±0.015)0.768(±0.017)0.771(±0.018)0.722(±0.068)0.746(±0.037)0.722(±0.063)0.787(±0.012)0.784(±0.028)
MethodWISDM
HeuristicTheoretical error guarantees
soTMVTMRTCRSORIWVDEVIWA (ours)TB
HoMM0.753(±0.054)0.741(±0.026)0.738(±0.031)0.739(±0.047)0.775\{±0.0620.753(±0.054)0.740(±0.054)0.728(±0.021)0.774(±0.037)
AdvSKM0.747(±0.050)0.771(±0.043)0.781 (±0.055)0.779(±0.035)0.742(±0.062)0.747(±0.050)0.747(±0.135)0.777(±0.031)0.779(±0.041)
DIRT0.738(±0.038)0.792(±0.015)0.797(±0.0240.797(±0.037)0.756(±0.071)0.738(±0.059)0.797(±0.059)0.798(±0.018)0.816(±0.063)
DDC0.741(±0.071)0.780(±0.032)0.779(±0.052)0.787(±0.049)0.737(±0.071)0.741(±0.076)0.741(±0.063)0.782(±0.038)0.770(±0.060)
CMD0.710(±0.088)0.772(±0.0210.765(±0.032)0.767(±0.040)0.728(±0.092)0.713(±0.084)0.686(±0.113)0.773(±0.032)0.742(±0.071)
MMDA0.759(±0.047)0.789(±0.0170.772(±0.030)0.745(±0.035)0.754(±0.050)0.759(±0.047)0.750(±0.047)0.790(±0.018)0.775(±0.030)
CoDATS0.711(±0.039)0.775(±0.0180.757(±0.027)0.751(±0.020)0.682(±0.057)0.709(±0.039)0.735(±0.054)0.764(±0.015)0.770(±0.019)
Deep-Coral0.694(±0.030)0.717(±0.041)0.723(±0.0370.713(±0.035)0.664(±0.055)0.694(±0.030)0.670(±0.149)0.723(±0.026)0.736(±0.044)
CDAN0.760(±0.057)0.762˙±0.0480.762(±0.046)0.781(±0.0510.768(±0.074)0.760(±0.057)0.750(±0.091)0.765(±0.040)0.779(±0.049)
DANN0.724(±0.042)0.789˙±0.018˙0.802(±0.0280.796(±0.036)0.745(±0.026)0.720(±0.042)0.702(±0.053)0.778(±0.019)0.765(±0.044)
DSAN0.759(±0.030)0.765(±0.024)0.769(±0.034)0.756(±0.025)0.757(±0.068)0.759(±0.030)0.663(±0.025)0.722(±0.013)0.779(±0.044)
Avg.0.736(±0.050)0.768(±0.027)0.768(±0.036)0.765(±0.037)0.737(±0.062)0.736(±0.052)0.726(±0.077)0.764(±0.025)0.771(±0.046)
方法WISDM
启发式理论误差保证
所以TMVTMRTCRSORIWVDEVIWA(我们的)TB
HoMM0.753(±0.054)0.741(±0.026)0.738(±0.031)0.739(±0.047)0.775{±0.0620.753(±0.054)0.740(±0.054)0.728(±0.021)0.774(±0.037)
AdvSKM0.747(±0.050)0.771(±0.043)0.781 (±0.055)0.779(±0.035)0.742(±0.062)0.747(±0.050)0.747(±0.135)0.777(±0.031)0.779(±0.041)
DIRT0.738(±0.038)0.792(±0.015)0.797(±0.0240.797(±0.037)0.756(±0.071)0.738(±0.059)0.797(±0.059)0.798(±0.018)0.816(±0.063)
DDC0.741(±0.071)0.780(±0.032)0.779(±0.052)0.787(±0.049)0.737(±0.071)0.741(±0.076)0.741(±0.063)0.782(±0.038)0.770(±0.060)
CMD0.710(±0.088)0.772(±0.0210.765(±0.032)0.767(±0.040)0.728(±0.092)0.713(±0.084)0.686(±0.113)0.773(±0.032)0.742(±0.071)
MMDA0.759(±0.047)0.789(±0.0170.772(±0.030)0.745(±0.035)0.754(±0.050)0.759(±0.047)0.750(±0.047)0.790(±0.018)0.775(±0.030)
CoDATS0.711(±0.039)0.775(±0.0180.757(±0.027)0.751(±0.020)0.682(±0.057)0.709(±0.039)0.735(±0.054)0.764(±0.015)0.770(±0.019)
深度珊瑚0.694(±0.030)0.717(±0.041)0.723(±0.037)0.713(±0.035)0.664(±0.055)0.694(±0.030)0.670(±0.149)0.723(±0.026)0.736(±0.044)
CDAN0.760(±0.057)0.762˙±0.0480.762(±0.046)0.781(±0.051)0.768(±0.074)0.760(±0.057)0.750(±0.091)0.765(±0.040)0.779(±0.049)
DANN0.724(±0.042)0.789˙±0.018˙0.802(±0.028)0.796(±0.036)0.745(±0.026)0.720(±0.042)0.702(±0.053)0.778(±0.019)0.765(±0.044)
DSAN0.759(±0.030)0.765(±0.024)0.769(±0.034)0.756(±0.025)0.757(±0.068)0.759(±0.030)0.663(±0.025)0.722(±0.013)0.779(±0.044)
平均值0.736(±0.050)0.768(±0.027)0.768(±0.036)0.765(±0.037)0.737(±0.062)0.736(±0.052)0.726(±0.077)0.764(±0.025)0.771(±0.046)

D.7.2 DETAILED AMAZON REVIEWS RESULTS
D.7.2 亚马逊评论结果的详细信息

Table 15: Mean and standard deviation (after ± ) of target classification accuracy on Amazon Reviews
表 15:在 Amazon Reviews(第 1 部分)上,经过 3 次不同随机初始化模型权重的重复实验后的目标分类准确率的均值和标准差(在 ± 之后)。
(Part 1) over 3 repetitions with different random initialization of model weights.
(第 1 部分)经过 3 次不同随机初始化模型权重的重复实验。
TaskHoMM
HeuristicTheoretical error guarantees
soTMVTMRTCRSORIWVDEVIWA (ours)TB
BD0.778(±0.01)0.793(±0.008)0.793(±0.008)0.795(±0.002)0.795(±0.008)0.782(±0.021)0.782(±0.021)0.794(±0.007)0.789(±0.008)
BE0.745(±0.014)0.759(±0.013)0.759(±0.013)0.761(±0.012)0.763(±0.0120.747(±0.005)0.749(±0.025)0.761(±0.012)0.758(±0.012)
BK0.77(±0.018)0.782(±0.0180.782(±0.018)0.782(±0.017)0.779(±0.013)0.761(±0.0180.767(±0.003)0.781(±0.017)0.776(±0.018)
DB0.785(±0.006)0.791(±0.0040.791(±0.004)0.79(±0.002)0.788(±0.008)0.778(±0.0020.777(±0.015)0.79(±0.003)0.787(±0.003)
DE0.772(±0.012)0.781(±0.003)0.781(±0.003)0.779(±0.004)0.778(±0.004)0.764(±0.017)0.769(±0.0090.781(±0.004)0.774(±0.003)
DK0.783(±0.015)0.791(±0.016)0.791(±0.016)0.791(±0.016)0.791(±0.017)0.781±0.010.782(±0.01)0.789(±0.0160.786(±0.017)
EB0.702(±0.026)0.7(±0.02)0.7(±0.02)0.701(±0.0190.699(±0.021)0.69(±0.011)0.692(±0.014)0.698(±0.0110.703(±0.03)
ED0.733(±0.003)0.733(±0.006)0.733(±0.006)0.736(±0.0110.733(±0.008)0.728(±0.011)0.719(±0.005)0.735(±0.011)0.735(±0.011)
EK0.862(±0.009)0.874(±0.008)0.874(±0.008)0.876(±0.0090.873(±0.008)0.856(±0.01)0.858(±0.012)0.873(±0.009)0.863(±0.009)
KB0.715(±0.008)0.721(±0.0040.721(±0.004)0.719(±0.004)0.721(±0.01)0.714(±0.007)0.708(±0.003)0.722(±0.0070.717(±0.007)
KD0.745(±0.011)0.752(±0.01)0.752(±0.01)0.747(±0.013)0.755(±0.0080.735(±0.018)0.742(±0.022)0.749(±0.006)0.745(±0.008)
KE0.844(±0.009)0.854(±0.008)0.854(±0.008)0.855(±0.005)0.857(±0.0090.845(±0.005)0.844(±0.006)0.856(±0.006)0.849(±0.009)
Avg.0.769(±0.012)0.778(±0.01)0.778(±0.01)0.778(±0.01)0.777(±0.011)0.765(±0.011)0.766(±0.012)0.778(±0.009)0.774(±0.011)
任务HoMM
启发式理论误差保证
所以TMVTMRTCRSORIWVDEVIWA(我们的)TB
BD0.778(±0.01)0.793(±0.008)0.793(±0.008)0.795(±0.002)0.795(±0.008)0.782(±0.021)0.782(±0.021)0.794(±0.007)0.789(±0.008)
BE0.745(±0.014)0.759(±0.013)0.759(±0.013)0.761(±0.012)0.763(±0.0120.747(±0.005)0.749(±0.025)0.761(±0.012)0.758(±0.012)
BK0.77(±0.018)0.782(±0.0180.782(±0.018)0.782(±0.017)0.779(±0.013)0.761(±0.0180.767(±0.003)0.781(±0.017)0.776(±0.018)
DB0.785(±0.006)0.791(±0.0040.791(±0.004)0.79(±0.002)0.788(±0.008)0.778(±0.0020.777(±0.015)0.79(±0.003)0.787(±0.003)
DE0.772(±0.012)0.781(±0.003)0.781(±0.003)0.779(±0.004)0.778(±0.004)0.764(±0.017)0.769(±0.0090.781(±0.004)0.774(±0.003)
DK0.783(±0.015)0.791(±0.016)0.791(±0.016)0.791(±0.016)0.791(±0.017)0.781±0.010.782(±0.01)0.789(±0.0160.786(±0.017)
EB0.702(±0.026)0.7(±0.02)0.7(±0.02)0.701(±0.0190.699(±0.021)0.69(±0.011)0.692(±0.014)0.698(±0.0110.703(±0.03)
ED0.733(±0.003)0.733(±0.006)0.733(±0.006)0.736(±0.0110.733(±0.008)0.728(±0.011)0.719(±0.005)0.735(±0.011)0.735(±0.011)
EK0.862(±0.009)0.874(±0.008)0.874(±0.008)0.876(±0.0090.873(±0.008)0.856(±0.01)0.858(±0.012)0.873(±0.009)0.863(±0.009)
KB0.715(±0.008)0.721(±0.004)0.721(±0.004)0.719(±0.004)0.721(±0.01)0.714(±0.007)0.708(±0.003)0.722(±0.0070.717(±0.007)
KD0.745(±0.011)0.752(±0.01)0.752(±0.01)0.747(±0.013)0.755(±0.0080.735(±0.018)0.742(±0.022)0.749(±0.006)0.745(±0.008)
KE0.844(±0.009)0.854(±0.008)0.854(±0.008)0.855(±0.005)0.857(±0.0090.845(±0.005)0.844(±0.006)0.856(±0.006)0.849(±0.009)
平均值0.769(±0.012)0.778(±0.01)0.778(±0.01)0.778(±0.01)0.777(±0.011)0.765(±0.011)0.766(±0.012)0.778(±0.009)0.774(±0.011)
TaskAdvSKM
HeuristicTheoretical error guarantees
soTMVTMRTCRSORIWVDEVIWA (ours)TB
BD0.784(±0.004)0.79(±0.003)0.79(±0.003)0.789(±0.003)0.79(±0.003)0.777(±0.008)0.778(±0.008)0.794(±0.005)0.79(±0.005)
BE0.754(±0.009)0.761(±0.014)0.761(±0.014)0.763(±0.0120.762(±0.011)0.754(±0.003)0.751(±0.007)0.761(±0.012)0.756(±0.012)
BK0.769(±0.021)0.784(±0.019)0.784(±0.019)0.782(±0.021)0.78(±0.016)0.771(±0.012)0.764(±0.016)0.782(±0.017)0.774(±0.019
DB0.779(±0.004)0.795(±0.007)0.795(±0.007)0.794(±0.007)0.792(±0.003)0.785(±0.007)0.78(±0.011)0.796(±0.009)0.789(±0.009
DE0.766(±0.013)0.785(±0.006)0.785(±0.006)0.784(±0.008)0.777(±0.005)0.773(±0.005)0.769(±0.009)0.785(±0.005)0.776(±0.002)
DK0.777(±0.015)0.79(±0.015)0.79(±0.015)0.79(±0.015)0.791(±0.0130.775(±0.014)0.775(±0.014)0.789(±0.013)0.786(±0.013)
EB0.688(±0.015)0.709(±0.0220.709(±0.022)0.705(±0.024)0.709(±0.0210.704(±0.0220.695(±0.022)0.712(±0.021)0.707(±0.021
ED0.714(±0.005)0.734(±0.002)0.734(±0.002)0.737(±0.0050.733(±0.008)0.725(±0.004)0.725(±0.004)0.732(±0.006)0.737(±0.011)
EK0.858(±0.004)0.874(±0.01)0.874(±0.01)0.874(±0.012)0.876(±0.01)0.86(±0.012)0.86(±0.012)0.875(±0.011)0.865(±0.01)
KB0.713(±0.002)0.723(±0.006)0.723(±0.006)0.724(±0.009)0.72(±0.004)0.708(±0.014)0.705(±0.011)0.726(±0.006)0.718(±0.006
KD0.741(±0.01)0.752(±0.012)0.752(±0.012)0.751(±0.013)0.752(±0.017)0.75(±0.019)0.742(±0.008)0.749(±0.012)0.756(±0.014
KE0.848(±0.008)0.857(±0.007)0.857(±0.007)0.858(±0.008)0.856(±0.008)0.846(±0.012)0.847(±0.007)0.858(±0.008)0.85(±0.008)
Avg.0.766(±0.009)0.779(±0.01)0.779(±0.01)0.779(±0.011)0.778(±0.01)0.769(±0.011)0.766(±0.011)0.78(±0.01)0.775(±0.011)
任务AdvSKM
启发式理论误差保证
所以TMVTMRTCRSORIWVDEVIWA(我们的)TB
BD0.784(±0.004)0.79(±0.003)0.79(±0.003)0.789(±0.003)0.79(±0.003)0.777(±0.008)0.778(±0.008)0.794(±0.005)0.79(±0.005)
BE0.754(±0.009)0.761(±0.014)0.761(±0.014)0.763(±0.0120.762(±0.011)0.754(±0.003)0.751(±0.007)0.761(±0.012)0.756(±0.012)
BK0.769(±0.021)0.784(±0.019)0.784(±0.019)0.782(±0.021)0.78(±0.016)0.771(±0.012)0.764(±0.016)0.782(±0.017)0.774(±0.019
DB0.779(±0.004)0.795(±0.007)0.795(±0.007)0.794(±0.007)0.792(±0.003)0.785(±0.007)0.78(±0.011)0.796(±0.009)0.789(±0.009
DE0.766(±0.013)0.785(±0.006)0.785(±0.006)0.784(±0.008)0.777(±0.005)0.773(±0.005)0.769(±0.009)0.785(±0.005)0.776(±0.002)
DK0.777(±0.015)0.79(±0.015)0.79(±0.015)0.79(±0.015)0.791(±0.0130.775(±0.014)0.775(±0.014)0.789(±0.013)0.786(±0.013)
EB0.688(±0.015)0.709(±0.0220.709(±0.022)0.705(±0.024)0.709(±0.0210.704(±0.0220.695(±0.022)0.712(±0.021)0.707(±0.021
ED0.714(±0.005)0.734(±0.002)0.734(±0.002)0.737(±0.0050.733(±0.008)0.725(±0.004)0.725(±0.004)0.732(±0.006)0.737(±0.011)
EK0.858(±0.004)0.874(±0.01)0.874(±0.01)0.874(±0.012)0.876(±0.01)0.86(±0.012)0.86(±0.012)0.875(±0.011)0.865(±0.01)
KB0.713(±0.002)0.723(±0.006)0.723(±0.006)0.724(±0.009)0.72(±0.004)0.708(±0.014)0.705(±0.011)0.726(±0.006)0.718(±0.006
KD0.741(±0.01)0.752(±0.012)0.752(±0.012)0.751(±0.013)0.752(±0.017)0.75(±0.019)0.742(±0.008)0.749(±0.012)0.756(±0.014
KE0.848(±0.008)0.857(±0.007)0.857(±0.007)0.858(±0.008)0.856(±0.008)0.846(±0.012)0.847(±0.007)0.858(±0.008)0.85(±0.008)
平均值0.766(±0.009)0.779(±0.01)0.779(±0.01)0.779(±0.011)0.778(±0.01)0.769(±0.011)0.766(±0.011)0.78(±0.01)0.775(±0.011)
TaskDIRT
HeuristicTheoretical error guarantees
soTMVTMRTCRSOR1WVDEVIWA (ours)TB
BD0.777(±0.01)0.816(±0.008)0.816(±0.008)0.817(±0.008)0.813(±0.007)0.791(±0.012)0.787(±0.032)0.813(±0.004)0.809(±0.008)
BE0.756(±0.0130.792(±0.006)0.792(±0.006)0.792(±0.01)0.782(±0.008)0.758(±0.011)0.758(±0.011)0.788(±0.008)0.804(±0.005)
BK0.77(±0.013)0.815(±0.0030.815(±0.003)0.814(±0.005)0.813(±0.006)0.793(±0.009)0.782(±0.025)0.814(±0.003)0.814(±0.003)
DB0.773(±0.008)0.812(±0.014)0.812(±0.014)0.809(±0.019)0.814(±0.01)0.805(±0.015)0.808(±0.004)0.81(±0.01)0.811(±0.01)
DE0.757(±0.02)0.813(±0.009)0.813(±0.009)0.814(±0.012)0.808(±0.012)0.787(±0.028)0.807(±0.024)0.814(±0.009)0.812(±0.009)
DK0.775(±0.016)0.836(±0.009)0.836(±0.009)0.835(±0.008)0.837(±0.00%0.789(±0.025)0.787(±0.038)0.832(±0.006)0.828(±0.004)
EB0.702(±0.0180.672(±0.015)0.672(±0.015)0.674(±0.015)0.714(±0.0160.702(±0.0180.675(±0.05)0.678(±0.015)0.708(±0.016)
ED0.716(±0.0080.689(±0.019)0.689(±0.019)0.69(±0.022)0.731(±0.029)0.718(±0.010.704(±0.027)0.702(±0.021)0.73(±0.029)
EK0.855(±0.013)0.888(±0.01)0.888(±0.01)0.885(±0.009)0.894(±0.0050.874(±0.008)0.873(±0.009)0.884(±0.0090.889(±0.005)
KB0.709(±0.012)0.707(±0.007)0.707(±0.007)0.711(±0.003)0.749(±0.005)0.73(±0.02)0.715(±0.005)0.711(±0.008)0.723(±0.005’
KD0.736(±0.012)0.721(±0.01)0.721(±0.01)0.727(±0.019)0.779(±0.003)0.745(±0.022)0.731(±0.007)0.727(±0.006)0.756(±0.003)
KE0.842(±0.009)0.869(±0.01)0.869(±0.01)0.868(±0.01)0.872i^±0.0090.848(±0.012)0.848(±0.008)0.868(±0.01)0.863(±0.009)
Avg.0.764(±0.013)0.786(±0.01)0.786(±0.01)0.786(±0.012)0.8(±0.01)0.778(±0.016)0.773(±0.02)0.787(±0.009)0.795(±0.009)
任务DIRT
启发式理论误差保证
所以TMVTMRTCRSOR1WVDEVIWA(我们的)TB
BD0.777(±0.01)0.816(±0.008)0.816(±0.008)0.817(±0.008)0.813(±0.007)0.791(±0.012)0.787(±0.032)0.813(±0.004)0.809(±0.008)
BE0.756(±0.0130.792(±0.006)0.792(±0.006)0.792(±0.01)0.782(±0.008)0.758(±0.011)0.758(±0.011)0.788(±0.008)0.804(±0.005)
BK0.77(±0.013)0.815(±0.0030.815(±0.003)0.814(±0.005)0.813(±0.006)0.793(±0.009)0.782(±0.025)0.814(±0.003)0.814(±0.003)
DB0.773(±0.008)0.812(±0.014)0.812(±0.014)0.809(±0.019)0.814(±0.01)0.805(±0.015)0.808(±0.004)0.81(±0.01)0.811(±0.01)
DE0.757(±0.02)0.813(±0.009)0.813(±0.009)0.814(±0.012)0.808(±0.012)0.787(±0.028)0.807(±0.024)0.814(±0.009)0.812(±0.009)
DK0.775(±0.016)0.836(±0.009)0.836(±0.009)0.835(±0.008)0.837(±0.00%0.789(±0.025)0.787(±0.038)0.832(±0.006)0.828(±0.004)
EB0.702(±0.0180.672(±0.015)0.672(±0.015)0.674(±0.015)0.714(±0.0160.702(±0.0180.675(±0.05)0.678(±0.015)0.708(±0.016)
ED0.716(±0.0080.689(±0.019)0.689(±0.019)0.69(±0.022)0.731(±0.029)0.718(±0.010.704(±0.027)0.702(±0.021)0.73(±0.029)
EK0.855(±0.013)0.888(±0.01)0.888(±0.01)0.885(±0.009)0.894(±0.0050.874(±0.008)0.873(±0.009)0.884(±0.0090.889(±0.005)
KB0.709(±0.012)0.707(±0.007)0.707(±0.007)0.711(±0.003)0.749(±0.005)0.73(±0.02)0.715(±0.005)0.711(±0.008)0.723(±0.005’
KD0.736(±0.012)0.721(±0.01)0.721(±0.01)0.727(±0.019)0.779(±0.003)0.745(±0.022)0.731(±0.007)0.727(±0.006)0.756(±0.003)
KE0.842(±0.009)0.869(±0.01)0.869(±0.01)0.868(±0.01)0.872i^±0.0090.848(±0.012)0.848(±0.008)0.868(±0.01)0.863(±0.009)
平均值0.764(±0.013)0.786(±0.01)0.786(±0.01)0.786(±0.012)0.8(±0.01)0.778(±0.016)0.773(±0.02)0.787(±0.009)0.795(±0.009)
TaskDDC
HeuristicTheoretical error guarantees
SOTMVTMRTCRSORIWVDEVIWA (ours)TB
BD0.781(±0.016)0.796(±0.008)0.796(±0.008)0.793(±0.013)0.794(±0.002)0.775(±0.017)0.778(±0.02)0.789(±0.009)0.786(±0.008)
BE0.752(±0.007)0.762(±0.014)0.762(±0.014)0.762(±0.013)0.766(±0.01)0.752(±0.013)0.749(±0.006)0.763(±0.014)0.756(±0.01)
BK0.766(±0.015)0.781(±0.015)0.781(±0.015)0.782(±0.017)0.778(±0.02)0.776(±0.016)0.774(±0.008)0.783(±0.014)0.78(±0.014)
DB0.781(±0.008)0.798(±0.0070.793(±0.007)0.791(±0.006)0.789(±0.012)0.787(±0.009)0.783(±0.005)0.796(±0.006)0.788(±0.006
DE0.767(±0.015)0.782(±0.004)0.782(±0.004)0.78(±0.004)0.783(±0.0080.778(±0.002)0.776(±0.004)0.784(±0.0040.778(±0.004)
DK0.782(±0.013)0.792(±0.015)0.792(±0.015)0.793(±0.0150.792(±0.018)0.782(±0.014)0.783(±0.014)0.793(±0.017)0.788(±0.017)
EB0.693(±0.019)0.703(±0.023)0.703(±0.023)0.706(±0.0210.7(±0.023)0.696(±0.011)0.689(±0.021)0.705(±0.0250.704(±0.021)
ED0.725(±0.006)0.736(±0.0060.736(±0.004)0.731(±0.006)0.735(±0.012)0.711(±0.009)0.736(±0.004)0.736(±0.0080.736(±0.007)
EK0.857(±0.011)0.87%(±0.0090.874(±0.009)0.872(±0.007)0.874(±0.006)0.857(±0.014)0.853(±0.009)0.877(±0.0110.864(±0.011)
KB0.713(±0.002)0.728(±0.0020.728/±0.0020.727(±0.005)0.721(±0.003)0.709(±0.015)0.708i^±0.007i^0.726±0.0030.719(±0.002)
KD0.741(±0.018)0.752(±0.011)0.752i^±0.011j^0.753(±0.010.752(±0.016)0.743i^±0.0120.748(±0.01)0.755(±0.013)0.748(±0.012)
KE0.841(±0.001)0.855(±0.011)0.855(±0.011)0.856(±0.008)0.8586˙±0.0080.84(±0.003)0.84(±0.01)0.856(±0.0080.849(±0.008)
Avg.0.766(±0.011)0.78(±0.01)0.78(±0.01)0.779(±0.01)0.778(±0.012)0.767(±0.011)0.768(±0.01)0.78(±0.011)0.775(±0.01)
任务DDC
启发式理论误差保证
SOTMVTMRTCRSORIWVDEVIWA(我们的)TB
BD0.781(±0.016)0.796(±0.008)0.796(±0.008)0.793(±0.013)0.794(±0.002)0.775(±0.017)0.778(±0.02)0.789(±0.009)0.786(±0.008)
BE0.752(±0.007)0.762(±0.014)0.762(±0.014)0.762(±0.013)0.766(±0.01)0.752(±0.013)0.749(±0.006)0.763(±0.014)0.756(±0.01)
BK0.766(±0.015)0.781(±0.015)0.781(±0.015)0.782(±0.017)0.778(±0.02)0.776(±0.016)0.774(±0.008)0.783(±0.014)0.78(±0.014)
DB0.781(±0.008)0.798(±0.0070.793(±0.007)0.791(±0.006)0.789(±0.012)0.787(±0.009)0.783(±0.005)0.796(±0.006)0.788(±0.006
DE0.767(±0.015)0.782(±0.004)0.782(±0.004)0.78(±0.004)0.783(±0.0080.778(±0.002)0.776(±0.004)0.784(±0.0040.778(±0.004)
DK0.782(±0.013)0.792(±0.015)0.792(±0.015)0.793(±0.0150.792(±0.018)0.782(±0.014)0.783(±0.014)0.793(±0.017)0.788(±0.017)
EB0.693(±0.019)0.703(±0.023)0.703(±0.023)0.706(±0.0210.7(±0.023)0.696(±0.011)0.689(±0.021)0.705(±0.0250.704(±0.021)
ED0.725(±0.006)0.736(±0.0060.736(±0.004)0.731(±0.006)0.735(±0.012)0.711(±0.009)0.736(±0.004)0.736(±0.0080.736(±0.007)
EK0.857(±0.011)0.87%(±0.0090.874(±0.009)0.872(±0.007)0.874(±0.006)0.857(±0.014)0.853(±0.009)0.877(±0.0110.864(±0.011)
KB0.713(±0.002)0.728(±0.0020.728/±0.0020.727(±0.005)0.721(±0.003)0.709(±0.015)0.708i^±0.007i^0.726±0.0030.719(±0.002)
KD0.741(±0.018)0.752(±0.011)0.752i^±0.011j^0.753(±0.010.752(±0.016)0.743i^±0.0120.748(±0.01)0.755(±0.013)0.748(±0.012)
KE0.841(±0.001)0.855(±0.011)0.855(±0.011)0.856(±0.008)0.8586˙±0.0080.84(±0.003)0.84(±0.01)0.856(±0.0080.849(±0.008)
平均值0.766(±0.011)0.78(±0.01)0.78(±0.01)0.779(±0.01)0.778(±0.012)0.767(±0.011)0.768(±0.01)0.78(±0.011)0.775(±0.01)
Table 16: Mean and standard deviation (after ± ) of target classification accuracy on Amazon Reviews (Part 2) over 3 repetitions with different random initialization of model weights.
表 16:在 Amazon Reviews(第 2 部分)上,经过 3 次不同随机初始化模型权重的重复实验后的目标分类准确率的均值和标准差(在 ± 之后)。
TaskCMD
HeuristicTheoretical error guarantees
SOTMVTMRTCRSORIWVDEVIWA (ours)TB
BD0.772(±0.009)0.794(±0.008)0.794(±0.008)0.794(±0.004)0.798(±0.003)0.779(±0.018)0.748(±0.036)0.8(±0.003)0.789(±0.003)
BE0.745(±0.012)0.785(±0.014)0.785(±0.014)0.779(±0.011)0.779(±0.011)0.746(±0.032)0.651(±0.077)0.782(±0.01)0.78(±0.014)
BK0.763(±0.015)0.796(±0.01)0.796(±0.01)0.79(±0.007)0.794(±0.013)0.787(±0.011)0.749(±0.028)0.797(±0.0070.793(±0.008
DB0.788(±0.01)0.801(±0.007)0.801(±0.007)0.798(±0.003)0.804(±0.0020.785(±0.011)0.754(±0.022)0.803(±0.0080.794(±0.002)
DE0.768(±0.003)0.804(±0.0060.804(±0.006)0.798(±0.003)0.801(±0.013)0.777(±0.031)0.744(±0.095)0.802(±0.0040.798(±0.007)
DK0.777(±0.012)0.803(±0.009)0.803(±0.009)0.797(±0.014)0.804(±0.009)0.773(±0.0060.673(±0.094)0.801(±0.0110.811(±0.007)
EB0.699(±0.014)0.72(±0.013)0.72(±0.013)0.717(±0.014)0.724(±0.0130.71(±0.016)0.68(±0.039)0.719(±0.013)0.712(±0.013)
ED0.722(±0.008)0.752(±0.006)0.752(±0.006)0.752(±0.0040.751(±0.005)0.693(±0.041)0.573(±0.01)0.755(±0.003)0.738(±0.003)
EK0.86(±0.012)0.874(±0.013)0.874(±0.013)0.872(±0.007)0.875(±0.008)0.822(±0.058)0.807(±0.054)0.876(±0.009)0.871(±0.009)
KB0.718(±0.005)0.749(±0.007)0.749(±0.007)0.738(±0.003)0.74(±0.003)0.716(±0.022)0.679(±0.056)0.746(±0.0040.74(±0.007)
KD0.748(±0.003)0.761(±0.016)0.761(±0.016)0.766(±0.022)0.77(±0.017)0.746(±0.011)0.655(±0.119)0.775(±0.016)0.761(±0.016)
KE0.842(±0.005)0.864(±0.006)0.864(±0.006)0.862(±0.002)0.869(±0.004)0.85(±0.011)0.806(±0.041)0.867(±0.008)0.858(±0.004)
Avg.0.767(±0.009)0.792(±0.01)0.792(±0.01)0.789(±0.008)0.792(±0.008)0.765(±0.022)0.71(±0.056)0.794(±0.008)0.787(±0.008)
任务命令
启发式理论误差保证
所以TMVTMRTCRSORIWVDEVIWA (我们的)TB
BD0.772(±0.009)0.794(±0.008)0.794(±0.008)0.794(±0.004)0.798(±0.003)0.779(±0.018)0.748(±0.036)0.8(±0.003)0.789(±0.003)
BE0.745(±0.012)0.785(±0.014)0.785(±0.014)0.779(±0.011)0.779(±0.011)0.746(±0.032)0.651(±0.077)0.782(±0.01)0.78(±0.014)
BK0.763(±0.015)0.796(±0.01)0.796(±0.01)0.79(±0.007)0.794(±0.013)0.787(±0.011)0.749(±0.028)0.797(±0.0070.793(±0.008
DB0.788(±0.01)0.801(±0.007)0.801(±0.007)0.798(±0.003)0.804(±0.0020.785(±0.011)0.754(±0.022)0.803(±0.0080.794(±0.002)
DE0.768(±0.003)0.804(±0.0060.804(±0.006)0.798(±0.003)0.801(±0.013)0.777(±0.031)0.744(±0.095)0.802(±0.0040.798(±0.007)
DK0.777(±0.012)0.803(±0.009)0.803(±0.009)0.797(±0.014)0.804(±0.009)0.773(±0.0060.673(±0.094)0.801(±0.0110.811(±0.007)
EB0.699(±0.014)0.72(±0.013)0.72(±0.013)0.717(±0.014)0.724(±0.0130.71(±0.016)0.68(±0.039)0.719(±0.013)0.712(±0.013)
ED0.722(±0.008)0.752(±0.006)0.752(±0.006)0.752(±0.004)0.751(±0.005)0.693(±0.041)0.573(±0.01)0.755(±0.003)0.738(±0.003)
EK0.86(±0.012)0.874(±0.013)0.874(±0.013)0.872(±0.007)0.875(±0.008)0.822(±0.058)0.807(±0.054)0.876(±0.009)0.871(±0.009)
KB0.718(±0.005)0.749(±0.007)0.749(±0.007)0.738(±0.003)0.74(±0.003)0.716(±0.022)0.679(±0.056)0.746(±0.0040.74(±0.007)
KD0.748(±0.003)0.761(±0.016)0.761(±0.016)0.766(±0.022)0.77(±0.017)0.746(±0.011)0.655(±0.119)0.775(±0.016)0.761(±0.016)
KE0.842(±0.005)0.864(±0.006)0.864(±0.006)0.862(±0.002)0.869(±0.004)0.85(±0.011)0.806(±0.041)0.867(±0.008)0.858(±0.004)
平均值0.767(±0.009)0.792(±0.01)0.792(±0.01)0.789(±0.008)0.792(±0.008)0.765(±0.022)0.71(±0.056)0.794(±0.008)0.787(±0.008)
TaskMMDA
HeuristicTheoretical error guarantees
soTMVTMRTCRSORIWVDEVIWA (ours)TB
BD0.775(±0.005)0.796(±0.005)0.796(±0.005)0.797(±0.009)0.794(±0.015)0.783(±0.004)0.786(±0.006)0.797(±0.003)0.792(±0.004)
BE0.752(±0.006)0.775(±0.01)0.775(±0.01)0.776(±0.0080.771(±0.014)0.766(±0.022)0.756(±0.012)0.779(±0.012)0.781(±0.006)
BK0.765(±0.009)0.793(±0.016)0.793(±0.016)0.792(±0.017)0.803(±0.0190.763(±0.009)0.767(±0.024)0.795(±0.017)0.802(±0.019)
DB0.783(±0.007)0.796(±0.0070.796(±0.007)0.795(±0.008)0.796(±0.006)0.789(±0.013)0.789(±0.012)0.798(±0.009)0.79(±0.009)
DE0.764(±0.003)0.792(±0.005)0.792(±0.0050.792(±0.005)0.788(±0.005)0.775(±0.0090.773(±0.009)0.795(±0.0060.799(±0.006)
DK0.775(±0.012)0.797(±0.014)0.797(±0.014)0.796(±0.013)0.811(±0.0080.791(±0.0180.79(±0.019)0.8(±0.012)0.8(±0.008)
EB0.701(±0.015)0.707(±0.017)0.707(±0.017)0.707(±0.021)0.713(±0.014)0.688(±0.0220.699(±0.004)0.709(±0.0160.707(±0.014)
ED0.738(±0.01)0.741(±0.004)0.741(±0.004)0.744(±0.0040.738(±0.001)0.732(±0.0070.703(±0.026)0.749(±0.0010.738(±0.001
EK0.855(±0.015)0.875(±0.007)0.875(±0.007)0.875(±0.007)0.878(±0.0080.852(±0.01)0.856(±0.014)0.878(±0.0090.867(±0.009
KB0.715(±0.008)0.73(±0.003)0.73(±0.003)0.726(±0.002)0.739(±0.0060.72(±0.009)0.716(±0.015)0.732(±0.0040.73(±0.006)
KD0.736(±0.014)0.761(±0.013)0.761(±0.013)0.755(±0.011)0.764(±0.018)0.734(±0.012)0.728(±0.045)0.758(±0.017)0.75(±0.018)
KE0.842(±0.008)0.856(±0.009)0.856(±0.009)0.858(±0.009)0.855(±0.012)0.835(±0.01)0.832(±0.014)0.858(±0.0050.847(±0.009)
Avg.0.767(±0.009)0.785(±0.009)0.785(±0.009)0.785(±0.01)0.787(±0.01)0.769(±0.012)0.766(±0.017)0.787(±0.009)0.784(±0.009)
任务MMDA
启发式理论误差保证
所以TMVTMRTCRSORIWVDEVIWA (我们的)TB
BD0.775(±0.005)0.796(±0.005)0.796(±0.005)0.797(±0.009)0.794(±0.015)0.783(±0.004)0.786(±0.006)0.797(±0.003)0.792(±0.004)
BE0.752(±0.006)0.775(±0.01)0.775(±0.01)0.776(±0.0080.771(±0.014)0.766(±0.022)0.756(±0.012)0.779(±0.012)0.781(±0.006)
BK0.765(±0.009)0.793(±0.016)0.793(±0.016)0.792(±0.017)0.803(±0.0190.763(±0.009)0.767(±0.024)0.795(±0.017)0.802(±0.019)
DB0.783(±0.007)0.796(±0.0070.796(±0.007)0.795(±0.008)0.796(±0.006)0.789(±0.013)0.789(±0.012)0.798(±0.009)0.79(±0.009)
DE0.764(±0.003)0.792(±0.005)0.792(±0.0050.792(±0.005)0.788(±0.005)0.775(±0.0090.773(±0.009)0.795(±0.0060.799(±0.006)
DK0.775(±0.012)0.797(±0.014)0.797(±0.014)0.796(±0.013)0.811(±0.0080.791(±0.0180.79(±0.019)0.8(±0.012)0.8(±0.008)
EB0.701(±0.015)0.707(±0.017)0.707(±0.017)0.707(±0.021)0.713(±0.014)0.688(±0.0220.699(±0.004)0.709(±0.0160.707(±0.014)
ED0.738(±0.01)0.741(±0.004)0.741(±0.004)0.744(±0.0040.738(±0.001)0.732(±0.0070.703(±0.026)0.749(±0.0010.738(±0.001
EK0.855(±0.015)0.875(±0.007)0.875(±0.007)0.875(±0.007)0.878(±0.0080.852(±0.01)0.856(±0.014)0.878(±0.0090.867(±0.009
KB0.715(±0.008)0.73(±0.003)0.73(±0.003)0.726(±0.002)0.739(±0.0060.72(±0.009)0.716(±0.015)0.732(±0.0040.73(±0.006)
KD0.736(±0.014)0.761(±0.013)0.761(±0.013)0.755(±0.011)0.764(±0.018)0.734(±0.012)0.728(±0.045)0.758(±0.017)0.75(±0.018)
KE0.842(±0.008)0.856(±0.009)0.856(±0.009)0.858(±0.009)0.855(±0.012)0.835(±0.01)0.832(±0.014)0.858(±0.0050.847(±0.009)
平均值0.767(±0.009)0.785(±0.009)0.785(±0.009)0.785(±0.01)0.787(±0.01)0.769(±0.012)0.766(±0.017)0.787(±0.009)0.784(±0.009)
TaskCoDATS
HeuristicTheoretical error guarantees
soTMVTMRTCRSORTWVDEVIWA (ours)TB
BD0.783(±0.013)0.8(±0.002)0.8(±0.002)0.8(±0.004)0.803(±0.003)0.788(±0.008)0.792(±0.001)0.805(±0.009)0.801(±0.008)
BE0.755(±0.005)0.788(±0.004)0.788(±0.004)0.787(±0.005)0.792(±0.0080.771(±0.011)0.771(±0.011)0.788(±0.0060.802(±0.009)
BK0.771(±0.022)0.808(±0.0070.808(±0.007)0.808(±0.008)0.807(±0.008)0.775(±0.022)0.793(±0.023)0.806(±0.009)0.817(±0.007
DB0.774(±0.001)0.796(±0.005)0.796(±0.005)0.795(±0.005)0.807(±0.0070.791(±0.0070.792(±0.007)0.802(±0.0040.8(±0.007)
DE0.769(±0.003)0.808(±0.009)0.808(±0.009)0.81(±0.009)0.808(±0.008)0.789(±0.02)0.777(±0.008)0.811(±0.0090.817(±0.011)
DK0.782(±0.021)0.819(±0.011)0.819(±0.011)0.82(±0.01)0.824(±0.01)0.796(±0.0210.789(±0.011)0.82(±0.01)0.828(±0.01)
EB0.687(±0.013)0.714(±0.019)0.714(±0.019)0.713(±0.015)0.725(±0.0210.7(±0.027)0.688(±0.01)0.717(±0.018)0.723(±0.021)
ED0.72(±0.013)0.745(±0.023)0.745(±0.023)0.744(±0.026)0.756(±0.017)0.738(±0.025)0.731(±0.027)0.747(±0.025)0.736(±0.017)
EK0.859(±0.016)0.883(±0.0090.883(±0.009)0.882(±0.008)0.879(±0.007)0.867(±0.01)0.866(±0.01)0.883(±0.012)0.87(±0.012)
KB0.712(±0.012)0.732(±0.004)0.732(±0.004)0.732(±0.009)0.747(±0.012)0.726(±0.037)0.694(±0.008)0.737(±0.006)0.751(±0.016)
KD0.73(±0.022)0.759(±0.006)0.759(±0.006)0.766(±0.009)0.775(±0.0160.752(±0.015)0.738(±0.007)0.766(±0.004)0.764(±0.016)
KE0.845(±0.0)0.867(±0.01)0.867(±0.01)0.866(±0.011)0.868(±0.0070.853(±0.003)0.851(±0.011)0.867(±0.007)0.856(±0.007)
Avg.0.766(±0.012)0.793(±0.009)0.793(±0.009)0.794(±0.01)0.799(±0.01)0.779(±0.017)0.773(±0.011)0.796(±0.01)0.797(±0.012)
任务CoDATS
启发式理论误差保证
所以TMVTMRTCRSORTWVDEVIWA(我们的)TB
BD0.783(±0.013)0.8(±0.002)0.8(±0.002)0.8(±0.004)0.803(±0.003)0.788(±0.008)0.792(±0.001)0.805(±0.009)0.801(±0.008)
BE0.755(±0.005)0.788(±0.004)0.788(±0.004)0.787(±0.005)0.792(±0.0080.771(±0.011)0.771(±0.011)0.788(±0.0060.802(±0.009)
BK0.771(±0.022)0.808(±0.0070.808(±0.007)0.808(±0.008)0.807(±0.008)0.775(±0.022)0.793(±0.023)0.806(±0.009)0.817(±0.007
DB0.774(±0.001)0.796(±0.005)0.796(±0.005)0.795(±0.005)0.807(±0.0070.791(±0.0070.792(±0.007)0.802(±0.0040.8(±0.007)
DE0.769(±0.003)0.808(±0.009)0.808(±0.009)0.81(±0.009)0.808(±0.008)0.789(±0.02)0.777(±0.008)0.811(±0.0090.817(±0.011)
DK0.782(±0.021)0.819(±0.011)0.819(±0.011)0.82(±0.01)0.824(±0.01)0.796(±0.0210.789(±0.011)0.82(±0.01)0.828(±0.01)
EB0.687(±0.013)0.714(±0.019)0.714(±0.019)0.713(±0.015)0.725(±0.0210.7(±0.027)0.688(±0.01)0.717(±0.018)0.723(±0.021)
ED0.72(±0.013)0.745(±0.023)0.745(±0.023)0.744(±0.026)0.756(±0.017)0.738(±0.025)0.731(±0.027)0.747(±0.025)0.736(±0.017)
EK0.859(±0.016)0.883(±0.0090.883(±0.009)0.882(±0.008)0.879(±0.007)0.867(±0.01)0.866(±0.01)0.883(±0.012)0.87(±0.012)
KB0.712(±0.012)0.732(±0.004)0.732(±0.004)0.732(±0.009)0.747(±0.012)0.726(±0.037)0.694(±0.008)0.737(±0.006)0.751(±0.016)
KD0.73(±0.022)0.759(±0.006)0.759(±0.006)0.766(±0.009)0.775(±0.0160.752(±0.015)0.738(±0.007)0.766(±0.004)0.764(±0.016)
KE0.845(±0.0)0.867(±0.01)0.867(±0.01)0.866(±0.011)0.868(±0.0070.853(±0.003)0.851(±0.011)0.867(±0.007)0.856(±0.007)
平均值0.766(±0.012)0.793(±0.009)0.793(±0.009)0.794(±0.01)0.799(±0.01)0.779(±0.017)0.773(±0.011)0.796(±0.01)0.797(±0.012)
TaskDeep-Coral
HeuristicTheoretical error guarantees
soTMVTMRTCRSORIWVDEVIWA (ours)TB
BD0.778(±0.018)0.799(±0.011)0.799(±0.011)0.799(±0.006)0.794(±0.011)0.792(±0.023)0.789(±0.025)0.801(±0.01)0.794(±0.012)
BE0.749(±0.013)0.771(±0.01)0.771(±0.01)0.771(±0.0080.764(±0.018)0.759(±0.002)0.752(±0.011)0.773(±0.0080.767(±0.008)
BK0.769(±0.011)0.785(±0.013)0.785(±0.013)0.784(±0.016)0.783(±0.015)0.762(±0.0130.77(±0.014)0.785(±0.017)0.779(±0.017)
DB0.783(±0.0090.796(±0.003)0.796(±0.003)0.796(±0.006)0.792(±0.006)0.78(±0.005)0.773(±0.005)0.797(±0.0040.786(±0.004)
DE0.76(±0.013)0.785(±0.002)0.785(±0.002)0.785(±0.0020.778(±0.006)0.762(±0.006)0.764(±0.008)0.784(±0.0030.785(±0.005)
DK0.78(±0.016)0.794(±0.011)0.794(±0.011)0.793(±0.012)0.795(±0.01%0.787(±0.007)0.783(±0.019)0.795(±0.0130.789(±0.013)
EB0.696(±0.009)0.708(±0.022)0.708(±0.022)0.709(±0.0240.708(±0.024)0.688(±0.016)0.698(±0.018)0.71(±0.024)0.708(±0.027)
ED0.722(±0.009)0.738(±0.005)0.738(±0.005)0.741(±0.005)0.741(±0.0010.73(±0.016)0.72(±0.017)0.74(±0.004)0.739(±0.001)
EK0.859(±0.01)0.879(±0.0050.879(±0.005)0.878(±0.007)0.877(±0.007)0.857(±0.007)0.858(±0.009)0.878(±0.0070.864(±0.005)
KB0.72(±0.01)0.733(±0.009)0.733(±0.009)0.7361˙±0.0090.733(±0.009)0.719(±0.011)0.722(±0.022)0.739(±0.01)0.74(±0.005)
KD0.733(±0.011)0.755(±0.012)0.755(±0.012)0.751(±0.009)0.759(±0.0090.75(±0.009)0.752(±0.013)0.755(±0.015)0.751(±0.009)
KE0.841(±0.01)0.858(±0.009)0.858(±0.009)0.858(±0.0110.857(±0.008)0.844(±0.011)0.849(±0.005)0.857(±0.01)0.849(±0.011)
Avg.0.766(±0.012)0.783(±0.009)0.783(±0.009)0.783(±0.01)0.782(±0.011)0.769(±0.011)0.769(±0.014)0.785(±0.01)0.779(±0.01)
任务深度珊瑚
启发式理论误差保证
所以TMVTMRTCRSORIWVDEVIWA(我们的)TB
BD0.778(±0.018)0.799(±0.011)0.799(±0.011)0.799(±0.006)0.794(±0.011)0.792(±0.023)0.789(±0.025)0.801(±0.01)0.794(±0.012)
BE0.749(±0.013)0.771(±0.01)0.771(±0.01)0.771(±0.0080.764(±0.018)0.759(±0.002)0.752(±0.011)0.773(±0.0080.767(±0.008)
BK0.769(±0.011)0.785(±0.013)0.785(±0.013)0.784(±0.016)0.783(±0.015)0.762(±0.0130.77(±0.014)0.785(±0.017)0.779(±0.017)
DB0.783(±0.0090.796(±0.003)0.796(±0.003)0.796(±0.006)0.792(±0.006)0.78(±0.005)0.773(±0.005)0.797(±0.0040.786(±0.004)
DE0.76(±0.013)0.785(±0.002)0.785(±0.002)0.785(±0.0020.778(±0.006)0.762(±0.006)0.764(±0.008)0.784(±0.0030.785(±0.005)
DK0.78(±0.016)0.794(±0.011)0.794(±0.011)0.793(±0.012)0.795(±0.01%0.787(±0.007)0.783(±0.019)0.795(±0.0130.789(±0.013)
EB0.696(±0.009)0.708(±0.022)0.708(±0.022)0.709(±0.0240.708(±0.024)0.688(±0.016)0.698(±0.018)0.71(±0.024)0.708(±0.027)
ED0.722(±0.009)0.738(±0.005)0.738(±0.005)0.741(±0.005)0.741(±0.0010.73(±0.016)0.72(±0.017)0.74(±0.004)0.739(±0.001)
EK0.859(±0.01)0.879(±0.0050.879(±0.005)0.878(±0.007)0.877(±0.007)0.857(±0.007)0.858(±0.009)0.878(±0.0070.864(±0.005)
KB0.72(±0.01)0.733(±0.009)0.733(±0.009)0.7361˙±0.0090.733(±0.009)0.719(±0.011)0.722(±0.022)0.739(±0.01)0.74(±0.005)
KD0.733(±0.011)0.755(±0.012)0.755(±0.012)0.751(±0.009)0.759(±0.0090.75(±0.009)0.752(±0.013)0.755(±0.015)0.751(±0.009)
KE0.841(±0.01)0.858(±0.009)0.858(±0.009)0.858(±0.0110.857(±0.008)0.844(±0.011)0.849(±0.005)0.857(±0.01)0.849(±0.011)
平均值0.766(±0.012)0.783(±0.009)0.783(±0.009)0.783(±0.01)0.782(±0.011)0.769(±0.011)0.769(±0.014)0.785(±0.01)0.779(±0.01)
Table 17: Mean and standard deviation (after ± ) of target classification accuracy on Amazon Reviews (Part 3) over 3 repetitions with different random initialization of model weights.
表 17:在 Amazon Reviews(第 3 部分)上,经过 3 次不同随机初始化模型权重的重复实验后的目标分类准确率的均值和标准差(在 ± 之后)。
TaskCDAN
HeuristicTheoretical error guarantees
soTMVTMRTCRSORIWVDEVIWA (ours)TB
BD0.784(±0.022)0.803(±0.007)0.803(±0.007)0.803(±0.004)0.795(±0.009)0.795(±0.009)0.792(±0.006)0.804(±0.009)0.797(±0.006)
BE0.758(±0.013)0.776(±0.009)0.776(±0.0090.774(±0.009)0.776(±0.012)0.764(±0.012)0.764(±0.012)0.778(±0.009)0.775(±0.009)
BK0.769(±0.019)0.796(±0.008)0.796(±0.0080.796(±0.0080.792(±0.014)0.78(±0.006)0.791(±0.007)0.796(±0.01)0.796(±0.01)
DB0.786(±0.01)0.797(±0.007)0.797(±0.007)0.795(±0.004)0.79(±0.004)0.793(±0.01)0.782(±0.01)0.798(±0.007)0.794(±0.008)
DE0.761(±0.003)0.8(±0.004)0.8(±0.004)0.8(±0.006)0.8(±0.005)0.775(±0.007)0.77(±0.009)0.799(±0.0040.794(±0.006)
DK0.778(±0.018)0.797(±0.012)0.797(±0.012)0.797(±0.01)0.804(±0.015)0.792(±0.02)0.793(±0.015)0.801(±0.0130.801(±0.015)
EB0.692(±0.014)0.707(±0.015)0.707(±0.015)0.707(±0.015)0.713(±0.024)0.695(±0.0120.707(±0.021)0.71(±0.018)0.711(±0.024)
ED0.72(±0.018)0.741(±0.021)0.741(±0.021)0.738(±0.011)0.741(±0.0050.722(±0.0170.723(±0.028)0.741(±0.0060.735(±0.009)
EK0.86(±0.006)0.879(±0.011)0.879(±0.011)0.878(±0.01)0.874(±0.012)0.861(±0.0180.861(±0.018)0.879(±0.011)0.865(±0.011)
KB0.706(±0.004)0.727(±0.003)0.727(±0.003)0.728(±0.001)0.732(±0.0050.724(±0.011)0.722(±0.004)0.733(±0.004)0.724(±0.004)
KD0.748(±0.013)0.765(±0.013)0.765(±0.013)0.764(±0.011)0.766(±0.0210.752(±0.021)0.755(±0.005)0.762(±0.01)0.752(±0.021)
KE0.845(±0.007)0.86(±0.005)0.86(±0.005)0.861(±0.007)0.863(±0.0030.848(±0.005)0.849(±0.007)0.861(±0.0060.85(±0.003)
Avg.0.767(±0.012)0.787(±0.01)0.787(±0.01)0.787(±0.008)0.787(±0.011)0.775(±0.012)0.776(±0.012)0.788(±0.009)0.783(±0.01)
任务CDAN
启发式理论误差保证
所以TMVTMRTCRSORIWVDEVIWA(我们的)TB
BD0.784(±0.022)0.803(±0.007)0.803(±0.007)0.803(±0.004)0.795(±0.009)0.795(±0.009)0.792(±0.006)0.804(±0.009)0.797(±0.006)
BE0.758(±0.013)0.776(±0.009)0.776(±0.0090.774(±0.009)0.776(±0.012)0.764(±0.012)0.764(±0.012)0.778(±0.009)0.775(±0.009)
BK0.769(±0.019)0.796(±0.008)0.796(±0.0080.796(±0.0080.792(±0.014)0.78(±0.006)0.791(±0.007)0.796(±0.01)0.796(±0.01)
DB0.786(±0.01)0.797(±0.007)0.797(±0.007)0.795(±0.004)0.79(±0.004)0.793(±0.01)0.782(±0.01)0.798(±0.007)0.794(±0.008)
DE0.761(±0.003)0.8(±0.004)0.8(±0.004)0.8(±0.006)0.8(±0.005)0.775(±0.007)0.77(±0.009)0.799(±0.0040.794(±0.006)
DK0.778(±0.018)0.797(±0.012)0.797(±0.012)0.797(±0.01)0.804(±0.015)0.792(±0.02)0.793(±0.015)0.801(±0.0130.801(±0.015)
EB0.692(±0.014)0.707(±0.015)0.707(±0.015)0.707(±0.015)0.713(±0.024)0.695(±0.0120.707(±0.021)0.71(±0.018)0.711(±0.024)
ED0.72(±0.018)0.741(±0.021)0.741(±0.021)0.738(±0.011)0.741(±0.0050.722(±0.0170.723(±0.028)0.741(±0.0060.735(±0.009)
EK0.86(±0.006)0.879(±0.011)0.879(±0.011)0.878(±0.01)0.874(±0.012)0.861(±0.0180.861(±0.018)0.879(±0.011)0.865(±0.011)
KB0.706(±0.004)0.727(±0.003)0.727(±0.003)0.728(±0.001)0.732(±0.0050.724(±0.011)0.722(±0.004)0.733(±0.004)0.724(±0.004)
KD0.748(±0.013)0.765(±0.013)0.765(±0.013)0.764(±0.011)0.766(±0.0210.752(±0.021)0.755(±0.005)0.762(±0.01)0.752(±0.021)
KE0.845(±0.007)0.86(±0.005)0.86(±0.005)0.861(±0.007)0.863(±0.0030.848(±0.005)0.849(±0.007)0.861(±0.0060.85(±0.003)
平均值0.767(±0.012)0.787(±0.01)0.787(±0.01)0.787(±0.008)0.787(±0.011)0.775(±0.012)0.776(±0.012)0.788(±0.009)0.783(±0.01)
TaskDANN
HeuristicTheoretical error guarantees
SOTMVTMRTCRSORIWVDEVIWA (ours)TB
BD0.783(±0.004)0.794(±0.003)0.794(±0.003)0.794(±0.008)0.803(±0.007)0.782(±0.005)0.782(±0.005)0.804(±0.008)0.813(±0.017)
BE0.752(±0.011)0.787(±0.009)0.787(±0.009)0.787(±0.013)0.796(±0.0060.783(±0.013)0.782(±0.013)0.789(±0.008)0.8(±0.005)
BK0.767(±0.022)0.804(±0.007)0.804(±0.007)0.804(±0.008)0.805(±0.0040.796(±0.0030.806(±0.0150.804(±0.008)0.82(±0.004)
DB0.779(±0.002)0.8(±0.007)0.8(±0.007)0.799(±0.006)0.802(±0.0030.796(±0.0040.778(±0.014)0.804(±0.0040.799(±0.004)
DE0.767(±0.016)0.805(±0.006)0.805(±0.006)0.807(±0.0090.808(±0.0120.775(±0.023)0.78(±0.035)0.81(±0.007)0.815(±0.007)
DK0.784(±0.013)0.815(±0.008)0.815(±0.0080.817(±0.011)0.829(±0.0140.788(±0.007)0.779(±0.012)0.816(±0.011)0.827(±0.014)
EB0.701(±0.019)0.712(±0.014)0.712(±0.014)0.712(±0.015)0.724(±0.0140.69(±0.03)0.702(±0.022)0.712(±0.0120.721(±0.014)
ED0.736(±0.005)0.743(±0.014)0.743(±0.014)0.747(±0.012)0.758(±0.0120.735(±0.053)0.738(±0.019)0.751(±0.0080.757(±0.012)
EK0.854(±0.016)0.877(±0.011)0.877(±0.011)0.878(±0.012)0.879(±0.0140.86(±0.009)0.858(±0.007)0.88(±0.013)0.875(±0.012)
KB0.711(±0.014)0.741(±0.004)0.741(±0.004)0.74(±0.004)0.759(±0.007)0.718(±0.012)0.729(±0.003)0.75(±0.004)0.747(˙±0.007˙
KD0.738(±0.006)0.768(±0.011)0.768(±0.011)0.765(±0.011)0.778(±0.0130.733(±0.013)0.758(±0.019)0.778(±0.012)0.762(˙±0.012˙
KE0.837(±0.013)0.864(±0.008)0.864(±0.008)0.864(±0.007)0.865i^±0.0090.86(±0.013)0.851(±0.008)0.866(±0.011)0.859(±0.011)
Avg.0.767(±0.012)0.792(±0.009)0.792(±0.009)0.793(±0.01)0.8(±0.01)0.776(±0.015)0.778(±0.015)0.797(±0.009)0.8(±0.01)
任务DANN
启发式理论误差保证
SOTMVTMRTCRSORIWVDEVIWA (我们的)TB
BD0.783(±0.004)0.794(±0.003)0.794(±0.003)0.794(±0.008)0.803(±0.007)0.782(±0.005)0.782(±0.005)0.804(±0.008)0.813(±0.017)
BE0.752(±0.011)0.787(±0.009)0.787(±0.009)0.787(±0.013)0.796(±0.0060.783(±0.013)0.782(±0.013)0.789(±0.008)0.8(±0.005)
BK0.767(±0.022)0.804(±0.007)0.804(±0.007)0.804(±0.008)0.805(±0.0040.796(±0.0030.806(±0.0150.804(±0.008)0.82(±0.004)
DB0.779(±0.002)0.8(±0.007)0.8(±0.007)0.799(±0.006)0.802(±0.0030.796(±0.0040.778(±0.014)0.804(±0.0040.799(±0.004)
DE0.767(±0.016)0.805(±0.006)0.805(±0.006)0.807(±0.0090.808(±0.0120.775(±0.023)0.78(±0.035)0.81(±0.007)0.815(±0.007)
DK0.784(±0.013)0.815(±0.008)0.815(±0.0080.817(±0.011)0.829(±0.0140.788(±0.007)0.779(±0.012)0.816(±0.011)0.827(±0.014)
EB0.701(±0.019)0.712(±0.014)0.712(±0.014)0.712(±0.015)0.724(±0.0140.69(±0.03)0.702(±0.022)0.712(±0.012)0.721(±0.014)
ED0.736(±0.005)0.743(±0.014)0.743(±0.014)0.747(±0.012)0.758(±0.0120.735(±0.053)0.738(±0.019)0.751(±0.008)0.757(±0.012)
EK0.854(±0.016)0.877(±0.011)0.877(±0.011)0.878(±0.012)0.879(±0.014)0.86(±0.009)0.858(±0.007)0.88(±0.013)0.875(±0.012)
KB0.711(±0.014)0.741(±0.004)0.741(±0.004)0.74(±0.004)0.759(±0.007)0.718(±0.012)0.729(±0.003)0.75(±0.004)0.747(˙±0.007˙
KD0.738(±0.006)0.768(±0.011)0.768(±0.011)0.765(±0.011)0.778(±0.013)0.733(±0.013)0.758(±0.019)0.778(±0.012)0.762(˙±0.012˙
KE0.837(±0.013)0.864(±0.008)0.864(±0.008)0.864(±0.007)0.865i^±0.0090.86(±0.013)0.851(±0.008)0.866(±0.011)0.859(±0.011)
平均值0.767(±0.012)0.792(±0.009)0.792(±0.009)0.793(±0.01)0.8(±0.01)0.776(±0.015)0.778(±0.015)0.797(±0.009)0.8(±0.01)
TaskDSAN
HeuristicTheoretical error guarantees
soTMVTMRTCRSORIWVDEVIWA (ours)TB
BD0.779(±0.013)0.801(±0.014)0.801(±0.014)0.803(±0.015)0.807(±0.01)0.79(±0.02)0.689(±0.108)0.805(±0.0140.805(±0.01)
BE0.752(±0.005)0.788(±0.005)0.788(±0.005)0.783(±0.005)0.798(±0.002)0.775(±0.022)0.764(±0.02)0.787(±0.0060.796(±0.002)
BK0.768(±0.012)0.799(±0.008)0.799(±0.008)0.799(±0.01)0.815(±0.007)0.786(±0.0360.798(±0.031)0.801(±0.0110.816(±0.005
DB0.782(±0.008)0.795(±0.004)0.795(±0.004)0.795(±0.002)0.799\{±0.0070.797(±0.01)0.729(±0.111)0.801(±0.0050.802(±0.007
DE0.771(±0.002)0.804(±0.0080.804(±0.008)0.804(±0.0080.816(±0.012)0.79(±0.02)0.792(±0.02)0.808(±0.0030.813(±0.012
DK0.786(±0.013)0.804(±0.01)0.804(±0.01)0.802(±0.012)0.821(±0.014)0.789(±0.013)0.801(±0.0250.807(±0.0120.83(±0.016)
EB0.702(±0.021)0.716(±0.017)0.716(±0.017)0.718(±0.021)0.718(±0.0180.71(±0.021)0.707(±0.0180.721(±0.02)0.711(±0.02)
ED0.725(±0.007)0.743(±0.008)0.743(±0.008)0.741(±0.0050.744(±0.0080.723(±0.0210.735(±0.012)0.749(±0.0050.732(±0.005
EK0.865(±0.013)0.883(±0.0090.883(±0.009)0.882(±0.0080.886(±0.009)0.874(±0.003)0.876(±0.0060.883(±0.01)0.876(±0.009)
KB0.716(±0.016)0.741(±0.01)0.741(±0.01)0.736(±0.007)0.746(±0.011)0.712(±0.014)0.707(±0.016)0.742(±0.0080.739(±0.011)
KD0.739(±0.023)0.773(±0.006)0.773(±0.006)0.768(±0.003)0.777(±0.01)0.746(±0.016)0.726(±0.035)0.776(±0.0050.763(±0.012)
KE0.84(±0.01)0.86(±0.005)0.86(±0.005)0.858(±0.007)0.868(±0.005)0.854(±0.001)0.835(±0.04)0.865(±0.0080.857(±0.005)
Avg.0.769(±0.012)0.792(±0.009)0.792(±0.009)0.791(±0.009)0.8(±0.009)0.779(±0.016)0.763(±0.037)0.795(±0.009)0.795(±0.009)
任务DSAN
启发式理论误差保证
所以TMVTMRTCRSORIWVDEVIWA(我们的)TB
BD0.779(±0.013)0.801(±0.014)0.801(±0.014)0.803(±0.015)0.807(±0.01)0.79(±0.02)0.689(±0.108)0.805(±0.0140.805(±0.01)
BE0.752(±0.005)0.788(±0.005)0.788(±0.005)0.783(±0.005)0.798(±0.002)0.775(±0.022)0.764(±0.02)0.787(±0.0060.796(±0.002)
BK0.768(±0.012)0.799(±0.008)0.799(±0.008)0.799(±0.01)0.815(±0.007)0.786(±0.0360.798(±0.031)0.801(±0.0110.816(±0.005
DB0.782(±0.008)0.795(±0.004)0.795(±0.004)0.795(±0.002)0.799{±0.0070.797(±0.01)0.729(±0.111)0.801(±0.0050.802(±0.007
DE0.771(±0.002)0.804(±0.0080.804(±0.008)0.804(±0.0080.816(±0.012)0.79(±0.02)0.792(±0.02)0.808(±0.0030.813(±0.012
DK0.786(±0.013)0.804(±0.01)0.804(±0.01)0.802(±0.012)0.821(±0.014)0.789(±0.013)0.801(±0.0250.807(±0.0120.83(±0.016)
EB0.702(±0.021)0.716(±0.017)0.716(±0.017)0.718(±0.021)0.718(±0.0180.71(±0.021)0.707(±0.0180.721(±0.02)0.711(±0.02)
ED0.725(±0.007)0.743(±0.008)0.743(±0.008)0.741(±0.0050.744(±0.0080.723(±0.0210.735(±0.012)0.749(±0.0050.732(±0.005
EK0.865(±0.013)0.883(±0.0090.883(±0.009)0.882(±0.0080.886(±0.009)0.874(±0.003)0.876(±0.0060.883(±0.01)0.876(±0.009)
KB0.716(±0.016)0.741(±0.01)0.741(±0.01)0.736(±0.007)0.746(±0.011)0.712(±0.014)0.707(±0.016)0.742(±0.0080.739(±0.011)
KD0.739(±0.023)0.773(±0.006)0.773(±0.006)0.768(±0.003)0.777(±0.01)0.746(±0.016)0.726(±0.035)0.776(±0.0050.763(±0.012)
KE0.84(±0.01)0.86(±0.005)0.86(±0.005)0.858(±0.007)0.868(±0.005)0.854(±0.001)0.835(±0.04)0.865(±0.0080.857(±0.005)
平均值0.769(±0.012)0.792(±0.009)0.792(±0.009)0.791(±0.009)0.8(±0.009)0.779(±0.016)0.763(±0.037)0.795(±0.009)0.795(±0.009)

D.7.3 Detailed MiniDomainNet Results
D.7.3 MiniDomainNet 结果的详细信息

Table 18: Mean and standard deviation (after ± ) of target classification accuracy on MiniDomainNet dataset over three different random initialization of model weights and five domain adaptation tasks.
表 18:在 MiniDomainNet 数据集上,针对三种不同随机初始化模型权重和五个领域适应任务的目标分类准确率的均值和标准差(在 ± 之后)。
MethodMiniDomainNet
HeuristicTheoretical error guarantees
SOTMVTMRTCRSORIWVDEVIWA (ours)TB
HoMM0.509(±0.018)0.526(±0.009)0.524(±0.010)0.523(±0.010)0.518(±0.015)0.511(±0.018)0.511(±0.018)0.531(±0.007)0.537(±0.020)
AdvSKM0.509(±0.015)0.516(±0.011)0.514(±0.014)0.514(±0.017)0.514(±0.009)0.515(±0.032)0.512(±0.032)0.516(±0.0110.522(±0.017)
DIRT0.499(±0.022)0.517(±0.006)0.515(±0.009)0.508(±0.006)0.507(±0.018)0.493(±0.028)0.498(±0.033)0.519(±0.006)0.525(±0.020)
DDC0.510(±0.032)0.514(±0.012)0.512(±0.018)0.511(±0.0190.514(±0.015)0.511(±0.028)0.512(±0.043)0.516(±0.0150.521(±0.019)
CMD0.509(±0.019)0.528(±0.011)0.531(±0.0160.531(±0.016)0.522(±0.013)0.518(±0.025)0.490(±0.037)0.533(±0.0090.533(±0.024)
MMDA0.509(±0.021)0.524(±0.0040.522(±0.005)0.523(±0.007)0.519(±0.012)0.517(±0.008)0.524(±0.006)0.527(±0.0090.531(±0.029)
CoDATS0.502±0.0260.535(±0.026)0.533(±0.027)0.536(±0.0280.525(±0.015)0.518(±0.031)0.524(±0.051)0.536(±0.0250.529(±0.042)
Deep-Coral0.505(±0.022)0.538(±0.009)0.534(±0.013)0.533(±0.012)0.520(±0.010)0.518(±0.022)0.524(±0.034)0.539(±0.012)0.535(±0.024)
CDAN0.514(±0.028)0.526(±0.0130.524(±0.011)0.525(±0.008)0.517(±0.005)0.513(±0.011)0.516(±0.014)0.532(±0.0100.542(±0.017)
DANN0.496(±0.023)0.530(±0.011)0.535(±0.011)0.538(±0.0050.516(±0.011)0.519(±0.015)0.515(±0.013)0.541(±0.006)0.532(±0.014)
DSAN0.509(±0.022)0.537(±0.011)0.534(±0.013)0.543(±θ,011)0.525(±0.013)0.513(±0.018)0.535(±0.023)0.546(±0.007)0.563(±0.013)
Avg.0.507(±0.022)0.526(±0.011)0.525(±0.014)0.526(±0.013)0.518(±0.012)0.513(±0.022)0.515(±0.028)0.531(±0.011)0.534(±0.022)
方法MiniDomainNet
启发式理论误差保证
SOTMVTMRTCRSORIWVDEVIWA(我们的)TB
HoMM0.509(±0.018)0.526(±0.009)0.524(±0.010)0.523(±0.010)0.518(±0.015)0.511(±0.018)0.511(±0.018)0.531(±0.007)0.537(±0.020)
AdvSKM0.509(±0.015)0.516(±0.011)0.514(±0.014)0.514(±0.017)0.514(±0.009)0.515(±0.032)0.512(±0.032)0.516(±0.0110.522(±0.017)
DIRT0.499(±0.022)0.517(±0.006)0.515(±0.009)0.508(±0.006)0.507(±0.018)0.493(±0.028)0.498(±0.033)0.519(±0.006)0.525(±0.020)
DDC0.510(±0.032)0.514(±0.012)0.512(±0.018)0.511(±0.0190.514(±0.015)0.511(±0.028)0.512(±0.043)0.516(±0.0150.521(±0.019)
CMD0.509(±0.019)0.528(±0.011)0.531(±0.0160.531(±0.016)0.522(±0.013)0.518(±0.025)0.490(±0.037)0.533(±0.0090.533(±0.024)
MMDA0.509(±0.021)0.524(±0.0040.522(±0.005)0.523(±0.007)0.519(±0.012)0.517(±0.008)0.524(±0.006)0.527(±0.0090.531(±0.029)
CoDATS0.502±0.0260.535(±0.026)0.533(±0.027)0.536(±0.0280.525(±0.015)0.518(±0.031)0.524(±0.051)0.536(±0.0250.529(±0.042)
Deep-Coral0.505(±0.022)0.538(±0.009)0.534(±0.013)0.533(±0.012)0.520(±0.010)0.518(±0.022)0.524(±0.034)0.539(±0.012)0.535(±0.024)
CDAN0.514(±0.028)0.526(±0.0130.524(±0.011)0.525(±0.008)0.517(±0.005)0.513(±0.011)0.516(±0.014)0.532(±0.0100.542(±0.017)
DANN0.496(±0.023)0.530(±0.011)0.535(±0.011)0.538(±0.0050.516(±0.011)0.519(±0.015)0.515(±0.013)0.541(±0.006)0.532(±0.014)
DSAN0.509(±0.022)0.537(±0.011)0.534(±0.013)0.543(±θ,011)0.525(±0.013)0.513(±0.018)0.535(±0.023)0.546(±0.007)0.563(±0.013)
平均0.507(±0.022)0.526(±0.011)0.525(±0.014)0.526(±0.013)0.518(±0.012)0.513(±0.022)0.515(±0.028)0.531(±0.011)0.534(±0.022)
Table 19: Mean and standard deviation (after ± ) of target classification accuracy on MiniDomainNet (Part 1) over 3 repetitions with different random initialization of model weights.
表 19:在 MiniDomainNet(第 1 部分)上,针对 3 次不同随机初始化模型权重的重复实验的目标分类准确率的均值和标准差(在 ± 之后)。
TaskHeuristicHoMMTheoretical error guarantees
soTMVTMRTCRSORIWVDEVIWA (ours)TB
RC0.552(±0.041)0.587(±0.017)0.587(±0.017)0.587(±0.005)0.557(±0.008)0.538(±0.029)0.56(±0.017)0.596(±0.009)0.607(±0.022)
RI0.373(±0.014)0.376(±0.012)0.376(±0.012)0.373(±0.005)0.386(±0.0070.37(±0.01)0.37(±0.01)0.392(±0.0070.413(±0.041)
RP0.709(±0.026)0.722(±0.01)0.722(±0.01)0.718(±0.006)0.709(±0.011)0.704(±0.013)0.705(±0.014)0.73(±0.003)0.721(±0.003)
RO0.332(±0.009)0.35(±0.006)0.35(±0.006)0.353(±0.008)0.356(±0.02)0.367(±0.019)0.342(±0.018)0.353(±0.003)0.37(±0.013)
RS0.579(±0.022)0.587(±0.012)0.587(±0.012)0.586(±0.003)0.582(±0.01)0.577(±0.004)0.577(±0.004)0.586(±0.0080.609(±0.011)
Avg.0.509(±0.023)0.524(±0.011)0.524(±0.011)0.523(±0.005)0.518(±0.011)0.511(±0.015)0.511(±0.013)0.531(±0.006)0.544(±0.018)
任务启发式HoMM理论误差保证
所以TMVTMRTCRSORIWVDEVIWA (我们的)TB
RC0.552(±0.041)0.587(±0.017)0.587(±0.017)0.587(±0.005)0.557(±0.008)0.538(±0.029)0.56(±0.017)0.596(±0.009)0.607(±0.022)
RI0.373(±0.014)0.376(±0.012)0.376(±0.012)0.373(±0.005)0.386(±0.0070.37(±0.01)0.37(±0.01)0.392(±0.0070.413(±0.041)
RP0.709(±0.026)0.722(±0.01)0.722(±0.01)0.718(±0.006)0.709(±0.011)0.704(±0.013)0.705(±0.014)0.73(±0.003)0.721(±0.003)
RO0.332(±0.009)0.35(±0.006)0.35(±0.006)0.353(±0.008)0.356(±0.02)0.367(±0.019)0.342(±0.018)0.353(±0.003)0.37(±0.013)
RS0.579(±0.022)0.587(±0.012)0.587(±0.012)0.586(±0.003)0.582(±0.01)0.577(±0.004)0.577(±0.004)0.586(±0.0080.609(±0.011)
平均值0.509(±0.023)0.524(±0.011)0.524(±0.011)0.523(±0.005)0.518(±0.011)0.511(±0.015)0.511(±0.013)0.531(±0.006)0.544(±0.018)
TaskAdvSKM
HeuristicTheoretical error guarantees
soTMVTMRTCRSORIWVDEVIWA (ours)TB
RC0.544(±0.017)0.56(±0.025)0.56(±0.025)0.566(±0.03)0.544(±0.024)0.549(±0.022)0.549(±0.022)0.568(±0.009)0.582(±0.046)
RI0.38(±0.018)0.373(±0.016)0.373(±0.016)0.371(±0.009)0.394(±0.037)0.394(±0.021)0.394(±0.0210.377(±0.013)0.395(±0.02)
RP0.723(±0.009)0.721(±0.0040.721(±0.004)0.719(±0.003)0.712(±0.006)0.706(±0.041)0.695(±0.034)0.721(±0.0030.723(±0.009)
RO0.322(±0.027)0.332(±0.006)0.332(±0.006)0.328(±0.005)0.335(±0.0020.335/±0.0020.332(±0.006)0.333(±0.005)0.335(±0.004)
RS0.579(±0.017)0.586(±0.001)0.586(±0.001)0.584(±0.003)0.586(±0.005)0.589(±0.007)0.589(±0.007)0.582(±0.004)0.592(±0.009)
Avg.0.509(±0.018)0.514(±0.01)0.514(±0.01)0.514(±0.01)0.514(±0.015)0.515(±0.018)0.512(±0.018)0.516(±0.007)0.525(±0.018)
任务AdvSKM
启发式理论误差保证
所以TMVTMRTCRSORIWVDEVIWA(我们的)TB
RC0.544(±0.017)0.56(±0.025)0.56(±0.025)0.566(±0.03)0.544(±0.024)0.549(±0.022)0.549(±0.022)0.568(±0.009)0.582(±0.046)
RI0.38(±0.018)0.373(±0.016)0.373(±0.016)0.371(±0.009)0.394(±0.037)0.394(±0.021)0.394(±0.021)0.377(±0.013)0.395(±0.02)
RP0.723(±0.009)0.721(±0.0040.721(±0.004)0.719(±0.003)0.712(±0.006)0.706(±0.041)0.695(±0.034)0.721(±0.0030.723(±0.009)
RO0.322(±0.027)0.332(±0.006)0.332(±0.006)0.328(±0.005)0.335(±0.002)0.335/±0.0020.332(±0.006)0.333(±0.005)0.335(±0.004)
RS0.579(±0.017)0.586(±0.001)0.586(±0.001)0.584(±0.003)0.586(±0.005)0.589(±0.007)0.589(±0.007)0.582(±0.004)0.592(±0.009)
平均值0.509(±0.018)0.514(±0.01)0.514(±0.01)0.514(±0.01)0.514(±0.015)0.515(±0.018)0.512(±0.018)0.516(±0.007)0.525(±0.018)
TaskDIRT
HeuristicTheoretical error guarantees
SOTMVTMRTCRSORTWVDEVIWA (ours)TB
RC0.53(±0.013)0.577(±0.033)0.577(±0.033)0.546(±0.031)0.571(±0.017)0.541(±0.025)0.544(±0.025)0.577(±0.025)0.587(±0.037)
RI0.373(±0.038)0.365(±0.041)0.365(±0.041)0.374(±0.037)0.37(±0.007)0.36(±0.011)0.385(±0.062)0.38(±0.031)0.431(±0.045)
RP0.704(±0.031)0.708(±0.026)0.708(±0.026)0.706(±0.02)0.689(±0.015)0.663(±0.03)0.67(±0.042)0.713(±0.03)0.715(±0.028)
RO0.34(±0.02)0.345(±0.031)0.345(±0.031)0.34(±0.046)0.336(±0.024)0.326(±0.086)0.365(±0.0380.342(±0.022)0.382(±0.027)
RS0.55(±0.028)0.578(±0.006)0.578(±0.006)0.571(±0.006)0.571(±0.012)0.572(±0.005)0.525(±0.088)0.583(±0.0160.588(±0.033)
Avg.0.499(±0.026)0.515(±0.027)0.515(±0.027)0.508(±0.028)0.507(±0.015)0.493(±0.031)0.498(±0.051)0.519(±0.025)0.541(±0.034)
任务DIRT
启发式理论误差保证
SOTMVTMRTCRSORTWVDEVIWA(我们的)TB
RC0.53(±0.013)0.577(±0.033)0.577(±0.033)0.546(±0.031)0.571(±0.017)0.541(±0.025)0.544(±0.025)0.577(±0.025)0.587(±0.037)
RI0.373(±0.038)0.365(±0.041)0.365(±0.041)0.374(±0.037)0.37(±0.007)0.36(±0.011)0.385(±0.062)0.38(±0.031)0.431(±0.045)
RP0.704(±0.031)0.708(±0.026)0.708(±0.026)0.706(±0.02)0.689(±0.015)0.663(±0.03)0.67(±0.042)0.713(±0.03)0.715(±0.028)
RO0.34(±0.02)0.345(±0.031)0.345(±0.031)0.34(±0.046)0.336(±0.024)0.326(±0.086)0.365(±0.0380.342(±0.022)0.382(±0.027)
RS0.55(±0.028)0.578(±0.006)0.578(±0.006)0.571(±0.006)0.571(±0.012)0.572(±0.005)0.525(±0.088)0.583(±0.0160.588(±0.033)
平均值0.499(±0.026)0.515(±0.027)0.515(±0.027)0.508(±0.028)0.507(±0.015)0.493(±0.031)0.498(±0.051)0.519(±0.025)0.541(±0.034)
TaskDDC
HeuristicTheoretical error guarantees
soTMVTMRTCRSORTWVDEVIWA (ours)TB
RC0.555(±0.045)0.555(±0.013)0.555(±0.013)0.549(±0.008)0.552(±0.009)0.546(±0.013)0.546(±0.013)0.568(±0.025)0.585(±0.076)
RI0.374(±0.014)0.391(±0.003)0.391(±0.003)0.38(±0.014)0.398(±0.024)0.379(±0.009)0.379(±0.009)0.385(±0.008)0.416(±0.014)
RP0.709(±0.025)0.714(±0.0)0.714(±0.0)0.712(±0.004)0.709(±0.004)0.713(±0.006)0.718(±0.0030.715(±0.002)0.717(±0.004)
RO0.334(±0.002)0.331(±0.003)0.331(±0.0030.333(±0.0)0.327(±0.018)0.333(±0.005)0.335(±0.002)0.335(±0.003)0.337(±0.004)
RS0.576(±0.022)0.571(±0.01)0.571(±0.01)0.578(±0.008)0.582(±0.005)0.582(±0.005)0.582(±0.005)0.579i^±0.0070.585(±0.021)
Avg.0.51(±0.021)0.512(±0.005)0.512(±0.005)0.511(±0.007)0.514(±0.012)0.511(±0.008)0.512(±0.006)0.516(±0.009)0.528(±0.024)
任务DDC
启发式理论误差保证
所以TMVTMRTCRSORTWVDEVIWA(我们的)TB
RC0.555(±0.045)0.555(±0.013)0.555(±0.013)0.549(±0.008)0.552(±0.009)0.546(±0.013)0.546(±0.013)0.568(±0.025)0.585(±0.076)
RI0.374(±0.014)0.391(±0.003)0.391(±0.003)0.38(±0.014)0.398(±0.024)0.379(±0.009)0.379(±0.009)0.385(±0.008)0.416(±0.014)
RP0.709(±0.025)0.714(±0.0)0.714(±0.0)0.712(±0.004)0.709(±0.004)0.713(±0.006)0.718(±0.003)0.715(±0.002)0.717(±0.004)
RO0.334(±0.002)0.331(±0.003)0.331(±0.0030.333(±0.0)0.327(±0.018)0.333(±0.005)0.335(±0.002)0.335(±0.003)0.337(±0.004)
RS0.576(±0.022)0.571(±0.01)0.571(±0.01)0.578(±0.008)0.582(±0.005)0.582(±0.005)0.582(±0.005)0.579i^±0.0070.585(±0.021)
平均值0.51(±0.021)0.512(±0.005)0.512(±0.005)0.511(±0.007)0.514(±0.012)0.511(±0.008)0.512(±0.006)0.516(±0.009)0.528(±0.024)
TaskCMD
HeuristicTheoretical error guarantees
soTMVTMRTCRSORTWVDEVIWA (ours)TB
RC0.555(±0.042)0.598(±0.008)0.598(±0.008)0.59(±0.008)0.544(±0.021)0.552(±0.031)0.443(±0.067)0.593(±0.009)0.617(±0.025)
RI0.377(±0.009)0.368(±0.007)0.368(±0.007)0.371(±0.005)0.395(±0.007)0.379(±0.011)0.379(±0.011)0.379(±0.007)0.389(±0.007)
RP0.704(±0.02)0.741(±0.0150.741(±0.015)0.733(±0.01)0.723(±0.02)0.732(±0.019)0.736(±0.013)0.74(±0.004)0.734(±0.015)
RO0.332(±0.015)0.365(±0.005)0.365i^±0.0050.374(±0.003)0.361(±0.036)0.338(±0.059)0.303(±0.061)0.364(±0.0070.378(±0.008)
RS0.578(±0.022)0.582(±0.01)0.582(±0.01)0.587(±0.005)0.589(˙±0.0055˙0.588(±0.019)0.591(±0.015)0.592(±0.0030.59(±0.003)
Avg.0.509(±0.022)0.531(±0.009)0.531(±0.009)0.531(±0.006)0.522(±0.018)0.518(±0.028)0.49(±0.033)0.533(±0.006)0.542(±0.012)
任务命令
启发式理论误差保证
所以TMVTMRTCRSORTWVDEVIWA(我们的)TB
RC0.555(±0.042)0.598(±0.008)0.598(±0.008)0.59(±0.008)0.544(±0.021)0.552(±0.031)0.443(±0.067)0.593(±0.009)0.617(±0.025)
RI0.377(±0.009)0.368(±0.007)0.368(±0.007)0.371(±0.005)0.395(±0.007)0.379(±0.011)0.379(±0.011)0.379(±0.007)0.389(±0.007)
RP0.704(±0.02)0.741(±0.015)0.741(±0.015)0.733(±0.01)0.723(±0.02)0.732(±0.019)0.736(±0.013)0.74(±0.004)0.734(±0.015)
RO0.332(±0.015)0.365(±0.005)0.365i^±0.0050.374(±0.003)0.361(±0.036)0.338(±0.059)0.303(±0.061)0.364(±0.0070.378(±0.008)
RS0.578(±0.022)0.582(±0.01)0.582(±0.01)0.587(±0.005)0.589(˙±0.0055˙0.588(±0.019)0.591(±0.015)0.592(±0.0030.59(±0.003)
平均值0.509(±0.022)0.531(±0.009)0.531(±0.009)0.531(±0.006)0.522(±0.018)0.518(±0.028)0.49(±0.033)0.533(±0.006)0.542(±0.012)
TaskMMDA
HeuristicTheoretical error guarantees
soTMVTMRTCRSORTWVDEVIWA (ours)TB
RC0.555(±0.042)0.587(±0.009)0.587(±0.009)0.571(±0.013)0.56(±0.019)0.544(±0.033)0.544(±0.033)0.585(±0.009)0.585(±0.009)
RI0.377(±0.009)0.379(±0.007)0.379(±0.007)0.376(±0.012)0.389(±0.008)0.389(±0.021)0.389(±0.0210.373(±0.003)0.415(±0.016)
RP0.704(±0.02)0.714(±0.014)0.714(±0.014)0.714(±0.011)0.717(±0.0140.713(±0.003)0.721(±0.015)0.723(±0.004)0.735(±0.007)
RO0.332(±0.015)0.352(±0.023)0.352(±0.023)0.372(±0.006)0.335(±0.018)0.353(±0.025)0.38(±0.035)0.362(±0.009)0.392(±0.018)
RS0.578(±0.022)0.581(±0.011)0.581(±0.011)0.583(±0.013)0.592(±0.0010.587(±0.01)0.587(±0.01)0.593(±0.009)0.605(±0.008)
Avg.0.509(±0.022)0.522(±0.013)0.522(±0.013)0.523(±0.011)0.519(±0.013)0.517(±0.018)0.524(±0.023)0.527(±0.007)0.546(±0.012)
任务MMDA
启发式理论误差保证
所以TMVTMRTCRSORTWVDEVIWA(我们的)TB
RC0.555(±0.042)0.587(±0.009)0.587(±0.009)0.571(±0.013)0.56(±0.019)0.544(±0.033)0.544(±0.033)0.585(±0.009)0.585(±0.009)
RI0.377(±0.009)0.379(±0.007)0.379(±0.007)0.376(±0.012)0.389(±0.008)0.389(±0.021)0.389(±0.0210.373(±0.003)0.415(±0.016)
RP0.704(±0.02)0.714(±0.014)0.714(±0.014)0.714(±0.011)0.717(±0.014)0.713(±0.003)0.721(±0.015)0.723(±0.004)0.735(±0.007)
RO0.332(±0.015)0.352(±0.023)0.352(±0.023)0.372(±0.006)0.335(±0.018)0.353(±0.025)0.38(±0.035)0.362(±0.009)0.392(±0.018)
RS0.578(±0.022)0.581(±0.011)0.581(±0.011)0.583(±0.013)0.592(±0.001)0.587(±0.01)0.587(±0.01)0.593(±0.009)0.605(±0.008)
平均值0.509(±0.022)0.522(±0.013)0.522(±0.013)0.523(±0.011)0.519(±0.013)0.517(±0.018)0.524(±0.023)0.527(±0.007)0.546(±0.012)
Table 20: Mean and standard deviation (after ± ) of target classification accuracy on MiniDomainNet (Part 2) over 3 repetitions with different random initialization of model weights.
表 20:在 MiniDomainNet(第 2 部分)上,针对 3 次不同随机初始化模型权重的重复实验的目标分类准确率的均值和标准差(在 ± 之后)。
TaskCoDATS
HeuristicTheoretical error guarantees
soTMVTMRTCRSORIWVDEVIWA (ours)TB
RC0.538(±0.072)0.585(±0.025)0.585(±0.025)0.607(±0.028)0.585(±0.013)0.555(±0.031)0.626(±0.058)0.577(±0.026)0.617(±0.058)
RI0.365(±0.037)0.371(±0.024)0.371(±0.024)0.367(±0.016)0.38(±0.02)0.373(±0.041)0.363(±0.056)0.371(±0.012)0.406(±0.021)
RP0.689(±0.024)0.735(±0.01)0.735(±0.01)0.734(±0.015)0.726(±0.014)0.689(±0.043)0.689(±0.043)0.743(±0.0160.732(±0.016)
RO0.322(±0.022)0.356(±0.02)0.356(±0.02)0.358(±0.017)0.34(±0.017)0.384(±0.012)0.348(±0.044)0.364(±0.012)0.417(±0.025)
RS0.597(±0.004)0.619(±0.013)0.619(±0.013)0.616(±0.018)0.593(±0.01)0.592(±0.013)0.592(±0.013)0.622(±0.009)0.627(±0.008)
Avg.0.502(±0.032)0.533(±0.018)0.533(±0.018)0.536(±0.019)0.525(±0.015)0.518(±0.028)0.524(±0.043)0.536(±0.015)0.56(±0.026)
任务CoDATS
启发式理论误差保证
所以TMVTMRTCRSORIWVDEVIWA(我们的)TB
RC0.538(±0.072)0.585(±0.025)0.585(±0.025)0.607(±0.028)0.585(±0.013)0.555(±0.031)0.626(±0.058)0.577(±0.026)0.617(±0.058)
RI0.365(±0.037)0.371(±0.024)0.371(±0.024)0.367(±0.016)0.38(±0.02)0.373(±0.041)0.363(±0.056)0.371(±0.012)0.406(±0.021)
RP0.689(±0.024)0.735(±0.01)0.735(±0.01)0.734(±0.015)0.726(±0.014)0.689(±0.043)0.689(±0.043)0.743(±0.0160.732(±0.016)
RO0.322(±0.022)0.356(±0.02)0.356(±0.02)0.358(±0.017)0.34(±0.017)0.384(±0.012)0.348(±0.044)0.364(±0.012)0.417(±0.025)
RS0.597(±0.004)0.619(±0.013)0.619(±0.013)0.616(±0.018)0.593(±0.01)0.592(±0.013)0.592(±0.013)0.622(±0.009)0.627(±0.008)
平均值0.502(±0.032)0.533(±0.018)0.533(±0.018)0.536(±0.019)0.525(±0.015)0.518(±0.028)0.524(±0.043)0.536(±0.015)0.56(±0.026)
TaskDeep-Coral
HeuristicTheoretical error guarantees
soTMVTMRTCRSORTWVDEVIWA (ours)TB
RC0.555(±0.045)0.585(±0.017)0.585(±0.017)0.582(±0.014)0.557(±0.0)0.536(±0.025)0.568(±0.013)0.59(±0.022)0.601(±0.056)
RI0.376(±0.012)0.37(±0.01)0.37(±0.01)0.368(±0.01)0.38(±0.0050.368(±0.003)0.368(±0.0030.367(±0.012)0.386(±0.007)
RP0.711(±0.021)0.726(±0.015)0.726(±0.015)0.723(±0.007)0.714(±0.007)0.722(±0.007)0.722(±0.012)0.738(±0.0120.734(±0.012)
RQ0.316(±0.041)0.383(±0.003)0.383(±0.003)0.386(±0.0040.365(±0.008)0.378(±0.011)0.375(±0.033)0.38(±0.002)0.388(±0.002)
RS0.569(±0.02)0.606(±0.009)0.606(±0.009)0.606(±0.005)0.581(±0.007)0.588(±0.008)0.588(±0.008)0.622(±0.0010.658(±0.035)
Avg.0.505(±0.028)0.534(±0.011)0.534(±0.011)0.533(±0.008)0.52(±0.005)0.518(±0.011)0.524(±0.014)0.539(±0.01)0.554(±0.022)
任务深度珊瑚
启发式理论误差保证
所以TMVTMRTCRSORTWVDEVIWA(我们的)TB
RC0.555(±0.045)0.585(±0.017)0.585(±0.017)0.582(±0.014)0.557(±0.0)0.536(±0.025)0.568(±0.013)0.59(±0.022)0.601(±0.056)
RI0.376(±0.012)0.37(±0.01)0.37(±0.01)0.368(±0.01)0.38(±0.005)0.368(±0.003)0.368(±0.0030.367(±0.012)0.386(±0.007)
RP0.711(±0.021)0.726(±0.015)0.726(±0.015)0.723(±0.007)0.714(±0.007)0.722(±0.007)0.722(±0.012)0.738(±0.0120.734(±0.012)
RQ0.316(±0.041)0.383(±0.003)0.383(±0.003)0.386(±0.004)0.365(±0.008)0.378(±0.011)0.375(±0.033)0.38(±0.002)0.388(±0.002)
RS0.569(±0.02)0.606(±0.009)0.606(±0.009)0.606(±0.005)0.581(±0.007)0.588(±0.008)0.588(±0.008)0.622(±0.0010.658(±0.035)
平均值0.505(±0.028)0.534(±0.011)0.534(±0.011)0.533(±0.008)0.52(±0.005)0.518(±0.011)0.524(±0.014)0.539(±0.01)0.554(±0.022)
TaskCDAN
HeuristicTheoretical error guarantees
soTMVTMRTCRSORIWVDEVIWA (ours)TB
RC0.557(±0.008)0.587(±0.013)0.587(±0.013)0.593(±0.026)0.563(±0.009)0.552(±0.025)0.574(±0.05)0.598(±0.022)0.615(±0.008)
RI0.386(±0.01)0.359(±0.005)0.359(±0.005)0.357(±0.009)0.368(±0.005)0.363(±0.04)0.335(±0.016)0.357(±0.009)0.386(±0.01)
RP0.709(±0.022)0.732(±0.02)0.732(±0.02)0.734(±0.023)0.71(±0.006)0.72(±0.039)0.723(±0.036)0.731(±0.0040.716(±0.023)
RO0.342(±0.006)0.352(±0.024)0.352(±0.024)0.349(±0.02)0.347(±0.016)0.356(±0.034)0.38(±0.0280.364(±0.013)0.442(±0.114)
RS0.576(±0.027)0.588(±0.01)0.588(±0.01)0.593(±0.005)0.596(±0.01)0.574(±0.022)0.568(±0.03)0.61(±0.008)0.651(±0.056)
Avg.0.514(±0.015)0.524(±0.014)0.524(±0.014)0.525(±0.017)0.517(±0.009)0.513(±0.032)0.516(±0.032)0.532(±0.011)0.562(±0.042)
任务CDAN
启发式理论误差保证
所以TMVTMRTCRSORIWVDEVIWA(我们的)TB
RC0.557(±0.008)0.587(±0.013)0.587(±0.013)0.593(±0.026)0.563(±0.009)0.552(±0.025)0.574(±0.05)0.598(±0.022)0.615(±0.008)
RI0.386(±0.01)0.359(±0.005)0.359(±0.005)0.357(±0.009)0.368(±0.005)0.363(±0.04)0.335(±0.016)0.357(±0.009)0.386(±0.01)
RP0.709(±0.022)0.732(±0.02)0.732(±0.02)0.734(±0.023)0.71(±0.006)0.72(±0.039)0.723(±0.036)0.731(±0.0040.716(±0.023)
RO0.342(±0.006)0.352(±0.024)0.352(±0.024)0.349(±0.02)0.347(±0.016)0.356(±0.034)0.38(±0.0280.364(±0.013)0.442(±0.114)
RS0.576(±0.027)0.588(±0.01)0.588(±0.01)0.593(±0.005)0.596(±0.01)0.574(±0.022)0.568(±0.03)0.61(±0.008)0.651(±0.056)
平均值0.514(±0.015)0.524(±0.014)0.524(±0.014)0.525(±0.017)0.517(±0.009)0.513(±0.032)0.516(±0.032)0.532(±0.011)0.562(±0.042)
TaskDANN
HeuristicTheoretical error guarantees
SOTMVTMRTCRSORIWVDEVIWA (ours)TB
RC0.519(±0.013)0.626(±0.021)0.626(±0.021)0.623(±0.036)0.557(±0.014)0.582(±0.022)0.56(±0.07)0.607(±0.008)0.631(±0.038)
RI0.376(±0.016)0.37(±0.013)0.37(±0.013)0.377(±0.013)0.389(±0.032)0.379(±0.036)0.379(±0.036)0.388(±0.0190.4(±0.025)
RP0.705(±0.011)0.704(±0.013)0.704(±0.013)0.704(±0.011)0.696(±0.006)0.699(±0.006)0.697(±0.014)0.721(±0.008)0.713(±0.008)
RO0.3(±0.043)0.377(±0.008)0.377(±0.008)0.38(±0.0060.343(±0.004)0.351(±0.048)0.351(±0.048)0.374(±0.0030.392(±0.005)
RS0.581(±0.014)0.598(±0.023)0.598(±0.023)0.607(±0.016)0.596(±0.01)0.586(±0.015)0.586(±0.015)0.617(±0.005)0.657(±0.015)
Avg.0.496(±0.019)0.535(±0.016)0.535(±0.016)0.538(±0.016)0.516(±0.013)0.519(±0.025)0.515(±0.037)0.541(±0.009)0.559(±0.018)
任务DANN
启发式理论误差保证
SOTMVTMRTCRSORIWVDEVIWA(我们的)TB
RC0.519(±0.013)0.626(±0.021)0.626(±0.021)0.623(±0.036)0.557(±0.014)0.582(±0.022)0.56(±0.07)0.607(±0.008)0.631(±0.038)
RI0.376(±0.016)0.37(±0.013)0.37(±0.013)0.377(±0.013)0.389(±0.032)0.379(±0.036)0.379(±0.036)0.388(±0.0190.4(±0.025)
RP0.705(±0.011)0.704(±0.013)0.704(±0.013)0.704(±0.011)0.696(±0.006)0.699(±0.006)0.697(±0.014)0.721(±0.008)0.713(±0.008)
RO0.3(±0.043)0.377(±0.008)0.377(±0.008)0.38(±0.0060.343(±0.004)0.351(±0.048)0.351(±0.048)0.374(±0.0030.392(±0.005)
RS0.581(±0.014)0.598(±0.023)0.598(±0.023)0.607(±0.016)0.596(±0.01)0.586(±0.015)0.586(±0.015)0.617(±0.005)0.657(±0.015)
平均值0.496(±0.019)0.535(±0.016)0.535(±0.016)0.538(±0.016)0.516(±0.013)0.519(±0.025)0.515(±0.037)0.541(±0.009)0.559(±0.018)
TaskDSAN
HeuristicTheoretical error guarantees
soTMVTMRTCRSORIWVDEVIWA (ours)TB
RC0.555(±0.042)0.637(±0.021)0.637(±0.021)0.639(±0.022)0.566(±0.016)0.557(±0.059)0.571(±0.045)0.639(±0.008)0.648(±0.014)
RI0.377(±0.009)0.363(±0.01)0.363(±0.01)0.374(±0.015)0.406(±0.005)0.362(±0.034)0.362(±0.034)0.377(±0.016)0.404(±0.005)
RP0.705(±0.021)0.712(±0.009)0.712(±0.009)0.714(±0.005)0.72(±0.008)0.716(±0.003)0.719(±0.003)0.719(±0.0070.716(±0.008)
RO0.332(±0.012)0.364(±0.021)0.364(±0.021)0.38(±0.012)0.34(±0.016)0.342(±0.009)0.437(±0.083)0.372(±0.011)0.478(±0.013)
RS0.577(±0.026)0.594(±0.004)0.594(±0.004)0.607(±0.005)0.594(±0.003)0.585(±0.004)0.585(±0.004)0.625(±0.016)0.639(±0.052)
Avg.0.509(±0.022)0.534(±0.013)0.534(±0.013)0.543(±0.012)0.525(±0.01)0.513(±0.022)0.535(±0.034)0.546(±0.012)0.577(±0.018)
任务DSAN
启发式理论误差保证
所以TMVTMRTCRSORIWVDEVIWA(我们的)TB
RC0.555(±0.042)0.637(±0.021)0.637(±0.021)0.639(±0.022)0.566(±0.016)0.557(±0.059)0.571(±0.045)0.639(±0.008)0.648(±0.014)
RI0.377(±0.009)0.363(±0.01)0.363(±0.01)0.374(±0.015)0.406(±0.005)0.362(±0.034)0.362(±0.034)0.377(±0.016)0.404(±0.005)
RP0.705(±0.021)0.712(±0.009)0.712(±0.009)0.714(±0.005)0.72(±0.008)0.716(±0.003)0.719(±0.003)0.719(±0.0070.716(±0.008)
RO0.332(±0.012)0.364(±0.021)0.364(±0.021)0.38(±0.012)0.34(±0.016)0.342(±0.009)0.437(±0.083)0.372(±0.011)0.478(±0.013)
RS0.577(±0.026)0.594(±0.004)0.594(±0.004)0.607(±0.005)0.594(±0.003)0.585(±0.004)0.585(±0.004)0.625(±0.016)0.639(±0.052)
平均值0.509(±0.022)0.534(±0.013)0.534(±0.013)0.543(±0.012)0.525(±0.01)0.513(±0.022)0.535(±0.034)0.546(±0.012)0.577(±0.018)

D.7.4 DETAILED TIME-SERIES RESULTS
D.7.4 详细时间序列结果

Table 21: Mean and standard deviation (after ± ) of target classification accuracy on Sleep-EDF (Part 1) over 3 repetitions with different random initialization of model weights.
表 21:在 Sleep-EDF(第 1 部分)上,针对 3 次不同随机初始化模型权重的重复实验的目标分类准确率的均值和标准差(在 ± 之后)。
TaskHoMM
HeuristicTheoretical error guarantees
SOTMVTMRTCRSORIWVDEVIWA (ours)TB
0110.565(±0.033)0.557(±0.066)0.557(±0.066)0.553(±0.068)0.72(±0.052)0.694(±0.075)0.488(±0.09)0.719(±0.04)0.654(±0.052)
1250.698(±0.081)0.771(±0.014)0.771(±0.014)0.768(±0.008)0.788(±0.0220.776(±0.026)0.776(±0.026)0.762(±0.022)0.811(±0.013)
1610.664(±0.068)0.702(±0.02)0.702(±0.02)0.698(±0.025)0.616(±0.017)0.628(±0.017)0.628(±0.017)0.682(±0.0080.729(±0.025)
7180.712(±0.024)0.734(±0.021)0.734(±0.021)0.736(±0.0120.678(±0.006)0.729(±0.025)0.734(±0.028)0.74(±0.008)0.755(±0.02)
9140.741(±0.031)0.832(±0.008)0.832(±0.008)0.833(±θ.0t)0.82(±0.024)0.802(±0.025)0.764(±0.07)0.832(±0.01)0.814(±0.01)
Avg.0.676(±0.047)0.719(±0.026)0.719(±0.026)0.718(±0.025)0.724(±0.024)0.726(±0.034)0.678(±0.046)0.747(±0.018)0.753(±0.024)
任务HoMM
启发式理论误差保证
SOTMVTMRTCRSORIWVDEVIWA(我们的)TB
0110.565(±0.033)0.557(±0.066)0.557(±0.066)0.553(±0.068)0.72(±0.052)0.694(±0.075)0.488(±0.09)0.719(±0.04)0.654(±0.052)
1250.698(±0.081)0.771(±0.014)0.771(±0.014)0.768(±0.008)0.788(±0.0220.776(±0.026)0.776(±0.026)0.762(±0.022)0.811(±0.013)
1610.664(±0.068)0.702(±0.02)0.702(±0.02)0.698(±0.025)0.616(±0.017)0.628(±0.017)0.628(±0.017)0.682(±0.0080.729(±0.025)
7180.712(±0.024)0.734(±0.021)0.734(±0.021)0.736(±0.0120.678(±0.006)0.729(±0.025)0.734(±0.028)0.74(±0.008)0.755(±0.02)
9140.741(±0.031)0.832(±0.008)0.832(±0.008)0.833(±θ.0t)0.82(±0.024)0.802(±0.025)0.764(±0.07)0.832(±0.01)0.814(±0.01)
平均值0.676(±0.047)0.719(±0.026)0.719(±0.026)0.718(±0.025)0.724(±0.024)0.726(±0.034)0.678(±0.046)0.747(±0.018)0.753(±0.024)
TaskAdvSKM
HeuristicTheoretical error guarantees
SOTMVTMRTCRSORTWVDEVIWA (ours)TB
0110.611(±0.005)0.59(±0.02)0.59(±0.02)0.589(±0.025)0.706(±0.043)0.645(±0.038)0.595(±0.035)0.664(±0.05)0.611(±0.043)
1250.723(±0.061)0.75(±0.02)0.75(±0.02)0.747(±0.011)0.777(±0.0240.753(±0.015)0.753(±0.015)0.754(±0.027)0.777(±0.051)
1610.558(±0.035)0.667(±0.037)0.667(±0.037)0.671(±0.0350.619(±0.043)0.65(±0.076)0.65(±0.076)0.658(±0.03)0.73(±0.033)
7180.651(±0.053)0.72(±0.022)0.72(±0.022)0.72(±0.018)0.669(±0.018)0.706(±0.041)0.681(±0.0230.707(±0.0040.723(±0.033)
9140.78(±0.027)0.835(±0.016)0.835(±0.016)0.835(±0.016)0.819(±0.032)0.76(±0.059)0.78(±0.027)0.824(±0.014)0.789(±0.016)
Avg.0.665(±0.036)0.712(±0.023)0.712(±0.023)0.712(±0.021)0.718(±0.032)0.703(±0.046)0.692(±0.035)0.722(±0.025)0.726(±0.035)
任务AdvSKM
启发式理论误差保证
SOTMVTMRTCRSORTWVDEVIWA(我们的)TB
0110.611(±0.005)0.59(±0.02)0.59(±0.02)0.589(±0.025)0.706(±0.043)0.645(±0.038)0.595(±0.035)0.664(±0.05)0.611(±0.043)
1250.723(±0.061)0.75(±0.02)0.75(±0.02)0.747(±0.011)0.777(±0.024)0.753(±0.015)0.753(±0.015)0.754(±0.027)0.777(±0.051)
1610.558(±0.035)0.667(±0.037)0.667(±0.037)0.671(±0.0350.619(±0.043)0.65(±0.076)0.65(±0.076)0.658(±0.03)0.73(±0.033)
7180.651(±0.053)0.72(±0.022)0.72(±0.022)0.72(±0.018)0.669(±0.018)0.706(±0.041)0.681(±0.0230.707(±0.004)0.723(±0.033)
9140.78(±0.027)0.835(±0.016)0.835(±0.016)0.835(±0.016)0.819(±0.032)0.76(±0.059)0.78(±0.027)0.824(±0.014)0.789(±0.016)
平均值0.665(±0.036)0.712(±0.023)0.712(±0.023)0.712(±0.021)0.718(±0.032)0.703(±0.046)0.692(±0.035)0.722(±0.025)0.726(±0.035)
TaskDIRT
HeuristicTheoretical error guarantees
SOTMVTMRTCRSORTWVDEVIWA (ours)TB
0110.582(±0.041)0.493(±0.106)0.493(±0.106)0.509(±0.094)0.625(±0.031)0.447(±0.123)0.487(±0.188)0.54(±0.043)0.582(±0.031)
1250.712(±0.08)0.835(±0.015)0.835(±0.015)0.832(±0.017)0.87(±0.019)0.805(±0.061)0.789(±0.105)0.829(±0.02)0.857(±0.019)
1610.526(±0.162)0.757(±0.025)0.757(±0.025)0.755(±0.021)0.671(±0.071)0.618(±0.124)0.618(±0.124)0.746(±0.03)0.797(±0.02)
7180.749(±0.041)0.777(±0.018)0.777(±0.018)0.775(±0.005)0.717(±0.044)0.74(±0.069)0.719(±0.072)0.762(±0.0)0.789(±0.02)
9140.711(±0.098)0.862(±0.016)0.862(±0.016)0.871(±0.0140.826(±0.027)0.786(±0.071)0.818(±0.045)0.868(±0.012)0.874(±0.022)
Avg.0.656(±0.084)0.745(±0.036)0.745(±0.036)0.748(±0.03)0.742(±0.039)0.679(±0.09)0.686(±0.107)0.749(±0.021)0.78(±0.023)
任务DIRT
启发式理论误差保证
SOTMVTMRTCRSORTWVDEVIWA(我们的)TB
0110.582(±0.041)0.493(±0.106)0.493(±0.106)0.509(±0.094)0.625(±0.031)0.447(±0.123)0.487(±0.188)0.54(±0.043)0.582(±0.031)
1250.712(±0.08)0.835(±0.015)0.835(±0.015)0.832(±0.017)0.87(±0.019)0.805(±0.061)0.789(±0.105)0.829(±0.02)0.857(±0.019)
1610.526(±0.162)0.757(±0.025)0.757(±0.025)0.755(±0.021)0.671(±0.071)0.618(±0.124)0.618(±0.124)0.746(±0.03)0.797(±0.02)
7180.749(±0.041)0.777(±0.018)0.777(±0.018)0.775(±0.005)0.717(±0.044)0.74(±0.069)0.719(±0.072)0.762(±0.0)0.789(±0.02)
9140.711(±0.098)0.862(±0.016)0.862(±0.016)0.871(±0.014)0.826(±0.027)0.786(±0.071)0.818(±0.045)0.868(±0.012)0.874(±0.022)
平均值0.656(±0.084)0.745(±0.036)0.745(±0.036)0.748(±0.03)0.742(±0.039)0.679(±0.09)0.686(±0.107)0.749(±0.021)0.78(±0.023)
TaskDDC
HeuristicTheoretical error guarantees
SOTMVTMRTCRSORIWVDEVIWA (ours)TB
0110.546(±0.033)0.603(±0.029)0.603(±0.029)0.598(±0.027)0.658(±0.06)0.629(±0.059)0.492(±0.032)0.645(±0.095)0.641(±0.06)
1250.695(±0.075)0.751(±0.01)0.751(±0.01)0.749(±0.012)0.779(±0.029)0.716(±0.046)0.716(±0.046)0.755(±0.015)0.783(±0.013)
1610.551(±0.091)0.683(±0.013)0.683(±0.013)0.672(±0.005)0.56(±0.028)0.618(±0.021)0.623(±0.012)0.666(±0.017)0.728(±0.051)
7180.699(±0.022)0.729(±0.038)0.729(±0.038)0.717(±0.018)0.674(±0.009)0.724(±0.029)0.712(±0.036)0.733(±0.013)0.743(±0.033)
9140.737(±0.035)0.84(±0.014)0.84(±0.014)0.827(±0.025)0.802(±0.014)0.784(±0.01)0.788(±0.042)0.823(±0.016)0.792(±0.014)
Avg.0.646(±0.051)0.721(±0.021)0.721(±0.021)0.712(±0.018)0.695(±0.028)0.694(±0.033)0.666(±0.034)0.724(±0.031)0.737(±0.034)
任务DDC
启发式理论误差保证
SOTMVTMRTCRSORIWVDEVIWA (我们的)TB
0110.546(±0.033)0.603(±0.029)0.603(±0.029)0.598(±0.027)0.658(±0.06)0.629(±0.059)0.492(±0.032)0.645(±0.095)0.641(±0.06)
1250.695(±0.075)0.751(±0.01)0.751(±0.01)0.749(±0.012)0.779(±0.029)0.716(±0.046)0.716(±0.046)0.755(±0.015)0.783(±0.013)
1610.551(±0.091)0.683(±0.013)0.683(±0.013)0.672(±0.005)0.56(±0.028)0.618(±0.021)0.623(±0.012)0.666(±0.017)0.728(±0.051)
7180.699(±0.022)0.729(±0.038)0.729(±0.038)0.717(±0.018)0.674(±0.009)0.724(±0.029)0.712(±0.036)0.733(±0.013)0.743(±0.033)
9140.737(±0.035)0.84(±0.014)0.84(±0.014)0.827(±0.025)0.802(±0.014)0.784(±0.01)0.788(±0.042)0.823(±0.016)0.792(±0.014)
平均0.646(±0.051)0.721(±0.021)0.721(±0.021)0.712(±0.018)0.695(±0.028)0.694(±0.033)0.666(±0.034)0.724(±0.031)0.737(±0.034)
TaskCMD
HeuristicTheoretical error guarantees
SOTMVTMRTCRSORIWVDEVIWA (ours)TB
0110.536(±0.041)0.564(±0.01)0.564(±0.01)0.551(±0.027)0.655(±0.079)0.607(±0.087)0.505(±0.016)0.581(±0.009)0.611(±0.079)
1250.751(±0.032)0.805(±0.012)0.805(±0.012)0.79(±0.016)0.798(±0.016)0.784(±0.03)0.629(±0.044)0.793(±0.014)0.835(±0.02)
1610.557(±0.099)0.747(±0.011)0.747(±0.011)0.701(±0.034)0.624(±0.018)0.678(±0.005)0.665(±0.018)0.69(±0.015)0.749(±0.019)
7180.681(±0.047)0.715(±0.017)0.715(±0.017)0.727(±0.004)0.669(±0.019)0.741(±0.005)0.728(±0.025)0.738(±0.004)0.75(±0.026)
9140.74(±0.07)0.85(±0.008)0.85(±0.008)0.848(±0.014)0.797(±0.021)0.772(±0.064)0.672(±0.229)0.841(±0.0060.841(±0.008)
Avg.0.653(±0.058)0.736(±0.012)0.736(±0.012)0.723(±0.019)0.709(±0.031)0.716(±0.038)0.64(±0.066)0.729(±0.01)0.757(±0.03)
任务CMD
启发式理论误差保证
SOTMVTMRTCRSORIWVDEVIWA(我们的)TB
0110.536(±0.041)0.564(±0.01)0.564(±0.01)0.551(±0.027)0.655(±0.079)0.607(±0.087)0.505(±0.016)0.581(±0.009)0.611(±0.079)
1250.751(±0.032)0.805(±0.012)0.805(±0.012)0.79(±0.016)0.798(±0.016)0.784(±0.03)0.629(±0.044)0.793(±0.014)0.835(±0.02)
1610.557(±0.099)0.747(±0.011)0.747(±0.011)0.701(±0.034)0.624(±0.018)0.678(±0.005)0.665(±0.018)0.69(±0.015)0.749(±0.019)
7180.681(±0.047)0.715(±0.017)0.715(±0.017)0.727(±0.004)0.669(±0.019)0.741(±0.005)0.728(±0.025)0.738(±0.004)0.75(±0.026)
9140.74(±0.07)0.85(±0.008)0.85(±0.008)0.848(±0.014)0.797(±0.021)0.772(±0.064)0.672(±0.229)0.841(±0.0060.841(±0.008)
平均值0.653(±0.058)0.736(±0.012)0.736(±0.012)0.723(±0.019)0.709(±0.031)0.716(±0.038)0.64(±0.066)0.729(±0.01)0.757(±0.03)
TaskMMIDA
HeuristicTheoretical error guarantees
soTMVTMRTCRSORIWVDEVIWA (ours)TB
0110.538(±0.032)0.52(±0.01)0.52(±0.01)0.523(±0.007)0.654(±0.025)0.596(±0.086)0.405(±0.099)0.612(±0.037)0.615(±0.025)
1250.749(±0.034)0.797(±0.016)0.797(±0.016)0.79(±0.025)0.798(±0.006)0.698(±0.079)0.747(±0.076)0.793(±0.01)0.823(±0.025)
1610.544(±0.116)0.705(±0.015)0.705(±0.015)0.7(±0.022)0.645(±0.032)0.703(±0.02)0.692(±0.0080.7(±0.014)0.739(±0.061)
7180.678(±0.046)0.773(±0.01)0.773(±0.01)0.762(±0.0)0.68(±0.051)0.729(±0.094)0.702(±0.082)0.772(±0.005)0.797(±0.004)
9140.742(±0.071)0.84(±0.029)0.84(±0.029)0.839(±0.027)0.793(±0.014)0.792(±0.073)0.755(±0.051)0.85(±0.012)0.835(±0.012)
Avg.0.65(±0.06)0.727(±0.016)0.727(±0.016)0.723(±0.016)0.714(±0.026)0.704(±0.07)0.66(±0.063)0.745(±0.016)0.762(±0.025)
任务MMIDA
启发式理论误差保证
所以TMVTMRTCRSORIWVDEVIWA(我们的)TB
0110.538(±0.032)0.52(±0.01)0.52(±0.01)0.523(±0.007)0.654(±0.025)0.596(±0.086)0.405(±0.099)0.612(±0.037)0.615(±0.025)
1250.749(±0.034)0.797(±0.016)0.797(±0.016)0.79(±0.025)0.798(±0.006)0.698(±0.079)0.747(±0.076)0.793(±0.01)0.823(±0.025)
1610.544(±0.116)0.705(±0.015)0.705(±0.015)0.7(±0.022)0.645(±0.032)0.703(±0.02)0.692(±0.0080.7(±0.014)0.739(±0.061)
7180.678(±0.046)0.773(±0.01)0.773(±0.01)0.762(±0.0)0.68(±0.051)0.729(±0.094)0.702(±0.082)0.772(±0.005)0.797(±0.004)
9140.742(±0.071)0.84(±0.029)0.84(±0.029)0.839(±0.027)0.793(±0.014)0.792(±0.073)0.755(±0.051)0.85(±0.012)0.835(±0.012)
平均值0.65(±0.06)0.727(±0.016)0.727(±0.016)0.723(±0.016)0.714(±0.026)0.704(±0.07)0.66(±0.063)0.745(±0.016)0.762(±0.025)
Table 22: Mean and standard deviation (after ± ) of target classification accuracy on Sleep-EDF (Part 2) over 3 repetitions with different random initialization of model weights.
表 22:在 Sleep-EDF(第 2 部分)上,针对 3 次不同随机初始化模型权重的重复实验的目标分类准确率的均值和标准差(在 ± 之后)。
TaskCoDATS
HeuristicTheoretical error guarantees
SOTMVTMRTCRSORIWVDEVIWA (ours)TB
0110.568(±0.027)0.565(±0.118)0.565(±0.118)0.556(±0.107)0.625(±0.023)0.539(±0.155)0.564(±0.008)0.602(±0.027)0.6(±0.023)
1250.682(±0.043)0.812(±0.014)0.812(±0.014)0.805(±0.014)0.827(±0.012)0.75(±0.024)0.728(±0.018)0.809(±0.007)0.798(±0.012)
1610.574(±0.026)0.732(±0.016)0.732(±0.016)0.729(±0.007)0.642(±0.042)0.633(±0.066)0.676(±0.031)0.729(±0.004)0.735(±0.082)
7180.759(±0.039)0.749(±0.023)0.749(±0.023)0.75(±0.02)0.715(±0.014)0.724(±0.008)0.717(±0.092)0.742(±0.01)0.771(±0.02)
9140.779(±0.038)0.837(±0.013)0.837(±0.013)0.839(±0.0080.805(±0.01)0.771(±0.026)0.763(±0.006)0.84(±0.014)0.811(±0.014)
Avg.0.672(±0.035)0.739(±0.037)0.739(±0.037)0.736(±0.031)0.723(±0.02)0.683(±0.056)0.69(±0.031)0.744(±0.012)0.743(±0.03)
任务CoDATS
启发式理论误差保证
SOTMVTMRTCRSORIWVDEVIWA(我们的)TB
0110.568(±0.027)0.565(±0.118)0.565(±0.118)0.556(±0.107)0.625(±0.023)0.539(±0.155)0.564(±0.008)0.602(±0.027)0.6(±0.023)
1250.682(±0.043)0.812(±0.014)0.812(±0.014)0.805(±0.014)0.827(±0.012)0.75(±0.024)0.728(±0.018)0.809(±0.007)0.798(±0.012)
1610.574(±0.026)0.732(±0.016)0.732(±0.016)0.729(±0.007)0.642(±0.042)0.633(±0.066)0.676(±0.031)0.729(±0.004)0.735(±0.082)
7180.759(±0.039)0.749(±0.023)0.749(±0.023)0.75(±0.02)0.715(±0.014)0.724(±0.008)0.717(±0.092)0.742(±0.01)0.771(±0.02)
9140.779(±0.038)0.837(±0.013)0.837(±0.013)0.839(±0.0080.805(±0.01)0.771(±0.026)0.763(±0.006)0.84(±0.014)0.811(±0.014)
平均值0.672(±0.035)0.739(±0.037)0.739(±0.037)0.736(±0.031)0.723(±0.02)0.683(±0.056)0.69(±0.031)0.744(±0.012)0.743(±0.03)
TaskDeep-Coral
HeuristicTheoretical error guarantees
soTMVTMRTCRSORPWVDEVIWA (ours)TB
0110.547(±0.032)0.586(±0.028)0.586(±0.028)0.6(±0.029)0.652(±0.058)0.629(±0.063)0.569(±0.11)0.592(±0.028)0.637(±0.058)
1250.694(±0.084)0.75(±0.02)0.75(±0.02)0.745(±0.016)0.781(±0.0280.758(±0.036)0.758(±0.036)0.755(±0.018)0.776(±0.028)
1610.545(±0.097)0.68(±0.015)0.68(±0.015)0.671(±0.007)0.564(±0.034)0.605(±0.018)0.553(±0.104)0.662(±0.017)0.728(±0.036)
7180.698(±0.022)0.73(±0.042)0.73(±0.042)0.716(±0.02)0.673(±0.006)0.723(±0.027)0.711(±0.034)0.733(±0.013)0.741(±0.016)
9140.733(±0.047)0.836(±0.016)0.836(±0.016)0.826(±0.02)0.802(±0.018)0.786(±0.014)0.784(±0.043)0.82(±0.02)0.792(±0.016)
Avg.0.643(±0.056)0.717(±0.024)0.717(±0.024)0.712(±0.018)0.694(±0.029)0.7(±0.031)0.675(±0.065)0.713(±0.019)0.735(±0.031)
任务深度珊瑚
启发式理论误差保证
所以TMVTMRTCRSORPWVDEVIWA(我们的)TB
0110.547(±0.032)0.586(±0.028)0.586(±0.028)0.6(±0.029)0.652(±0.058)0.629(±0.063)0.569(±0.11)0.592(±0.028)0.637(±0.058)
1250.694(±0.084)0.75(±0.02)0.75(±0.02)0.745(±0.016)0.781(±0.0280.758(±0.036)0.758(±0.036)0.755(±0.018)0.776(±0.028)
1610.545(±0.097)0.68(±0.015)0.68(±0.015)0.671(±0.007)0.564(±0.034)0.605(±0.018)0.553(±0.104)0.662(±0.017)0.728(±0.036)
7180.698(±0.022)0.73(±0.042)0.73(±0.042)0.716(±0.02)0.673(±0.006)0.723(±0.027)0.711(±0.034)0.733(±0.013)0.741(±0.016)
9140.733(±0.047)0.836(±0.016)0.836(±0.016)0.826(±0.02)0.802(±0.018)0.786(±0.014)0.784(±0.043)0.82(±0.02)0.792(±0.016)
平均值0.643(±0.056)0.717(±0.024)0.717(±0.024)0.712(±0.018)0.694(±0.029)0.7(±0.031)0.675(±0.065)0.713(±0.019)0.735(±0.031)
TaskCDAN
HeuristicTheoretical error guarantees
soTMVTMRTCRSORIWVDEVIWA (ours)TB
0110.553(±0.024)0.499(±0.065)0.499(±0.065)0.507(±0.079)0.643(±0.053)0.529(±0.111)0.342(±0.024)0.565(±0.078)0.553(±0.053)
1250.658(±0.101)0.844(±0.01)0.844(±0.01)0.84(±0.007)0.829(±0.01)0.792(±0.06)0.671(±0.055)0.835(±0.002)0.85(±0.016)
1610.699(±0.077)0.729(±0.0230.729(±0.023)0.726(±0.028)0.659(±0.05)0.626(±0.117)0.682(±0.029)0.706(±0.017)0.764(±0.084)
7180.647(±0.045)0.758(±0.017)0.758(±0.017)0.76(±0.0160.704(±0.005)0.73(±0.004)0.754(±0.041)0.773(±0.012)0.794(±0.035)
9140.703(±0.041)0.865(±0.018)0.865(±0.018)0.862(±0.028)0.806(±0.032)0.81(±0.053)0.759(±0.041)0.862(±0.0180.832(±0.018)
Avg.0.652(±0.058)0.739(±0.027)0.739(±0.027)0.739(±0.032)0.728(±0.03)0.697(±0.069)0.642(±0.038)0.748(±0.025)0.759(±0.041)
任务CDAN
启发式理论误差保证
所以TMVTMRTCRSORIWVDEVIWA (我们的)TB
0110.553(±0.024)0.499(±0.065)0.499(±0.065)0.507(±0.079)0.643(±0.053)0.529(±0.111)0.342(±0.024)0.565(±0.078)0.553(±0.053)
1250.658(±0.101)0.844(±0.01)0.844(±0.01)0.84(±0.007)0.829(±0.01)0.792(±0.06)0.671(±0.055)0.835(±0.002)0.85(±0.016)
1610.699(±0.077)0.729(±0.0230.729(±0.023)0.726(±0.028)0.659(±0.05)0.626(±0.117)0.682(±0.029)0.706(±0.017)0.764(±0.084)
7180.647(±0.045)0.758(±0.017)0.758(±0.017)0.76(±0.0160.704(±0.005)0.73(±0.004)0.754(±0.041)0.773(±0.012)0.794(±0.035)
9140.703(±0.041)0.865(±0.018)0.865(±0.018)0.862(±0.028)0.806(±0.032)0.81(±0.053)0.759(±0.041)0.862(±0.0180.832(±0.018)
平均值0.652(±0.058)0.739(±0.027)0.739(±0.027)0.739(±0.032)0.728(±0.03)0.697(±0.069)0.642(±0.038)0.748(±0.025)0.759(±0.041)
TaskDANN
HeuristicTheoretical error guarantees
SOTMVTMRTCRSORIWVDEVIWA (ours)TB
0110.551(±0.018)0.602(±0.058)0.602(±0.058)0.602(±0.052)0.672(±0.008)0.594(±0.081)0.495(±0.114)0.622(±0.04)0.574(±0.008)
1250.645(±0.075)0.801(±0.014)0.801(±0.014)0.792(±0.014)0.798(±0.018)0.686(±0.0510.686(±0.051)0.784(±0.02)0.819(±0.054)
1610.57(±0.136)0.676(±0.002)0.676(±0.002)0.67(±0.008)0.593(±0.025)0.651(±0.055)0.651(±0.055)0.661(±0.0080.747(±0.049)
7180.695(±0.037)0.725(±0.006)0.725(±0.006)0.725(±0.014)0.686(±0.013)0.703(±0.034)0.703(±0.034)0.732(±0.011)0.738(±0.018)
9140.746(±0.021)0.814(±0.002)0.814(±0.002)0.815(±0.015)0.822(±0.0120.799(±0.039)0.686(±0.085)0.822(±0.01)0.81(±0.01)
Avg.0.641(±0.057)0.723(±0.016)0.723(±0.016)0.721(±0.02)0.714(±0.015)0.687(±0.052)0.644(±0.068)0.724(±0.018)0.738(±0.028)
任务DANN
启发式理论误差保证
SOTMVTMRTCRSORIWVDEVIWA(我们的)TB
0110.551(±0.018)0.602(±0.058)0.602(±0.058)0.602(±0.052)0.672(±0.008)0.594(±0.081)0.495(±0.114)0.622(±0.04)0.574(±0.008)
1250.645(±0.075)0.801(±0.014)0.801(±0.014)0.792(±0.014)0.798(±0.018)0.686(±0.0510.686(±0.051)0.784(±0.02)0.819(±0.054)
1610.57(±0.136)0.676(±0.002)0.676(±0.002)0.67(±0.008)0.593(±0.025)0.651(±0.055)0.651(±0.055)0.661(±0.0080.747(±0.049)
7180.695(±0.037)0.725(±0.006)0.725(±0.006)0.725(±0.014)0.686(±0.013)0.703(±0.034)0.703(±0.034)0.732(±0.011)0.738(±0.018)
9140.746(±0.021)0.814(±0.002)0.814(±0.002)0.815(±0.015)0.822(±0.0120.799(±0.039)0.686(±0.085)0.822(±0.01)0.81(±0.01)
平均值0.641(±0.057)0.723(±0.016)0.723(±0.016)0.721(±0.02)0.714(±0.015)0.687(±0.052)0.644(±0.068)0.724(±0.018)0.738(±0.028)
TaskDSAN
HeuristicTheoretical error guarantees
soTMVTMRTCRSORIWVDEVIWA (ours)TB
0110.547(±0.012)0.539(±0.035)0.539(±0.035)0.527(±0.063)0.668(±0.024)0.59(±0.055)0.311(±0.026)0.629(±0.051)0.603(±0.024)
1250.717(±0.086)0.84(±0.02)0.84(±0.02)0.841(±0.0220.837(±0.026)0.822(±0.028)0.699(±0.0610.842(±0.018)0.855(±0.007)
1610.697(±0.06)0.755(±0.031)0.755(±0.031)0.722(±0.02)0.612(±0.026)0.617(±0.118)0.617(±0.1180.721(±0.014)0.742(±0.031)
7180.684(±0.048)0.734(±0.029)0.734(±0.029)0.741(±0.011)0.71(±0.067)0.749(±0.02)0.745(±0.025)0.764(±0.002)0.768(±0.059)
9140.62(±0.04)0.832(±0.024)0.832(±0.024)0.827(±0.018)0.815(±0.018)0.784(±0.046)0.576(±0.154)0.828(±0.021)0.823(±0.024)
Avg.0.653(±0.049)0.74(±0.028)0.74(±0.028)0.732(±0.027)0.728(±0.032)0.712(±0.053)0.589(±0.077)0.757(±0.021)0.758(±0.029)
任务DSAN
启发式理论误差保证
所以TMVTMRTCRSORIWVDEVIWA(我们的)TB
0110.547(±0.012)0.539(±0.035)0.539(±0.035)0.527(±0.063)0.668(±0.024)0.59(±0.055)0.311(±0.026)0.629(±0.051)0.603(±0.024)
1250.717(±0.086)0.84(±0.02)0.84(±0.02)0.841(±0.0220.837(±0.026)0.822(±0.028)0.699(±0.0610.842(±0.018)0.855(±0.007)
1610.697(±0.06)0.755(±0.031)0.755(±0.031)0.722(±0.02)0.612(±0.026)0.617(±0.118)0.617(±0.1180.721(±0.014)0.742(±0.031)
7180.684(±0.048)0.734(±0.029)0.734(±0.029)0.741(±0.011)0.71(±0.067)0.749(±0.02)0.745(±0.025)0.764(±0.002)0.768(±0.059)
9140.62(±0.04)0.832(±0.024)0.832(±0.024)0.827(±0.018)0.815(±0.018)0.784(±0.046)0.576(±0.154)0.828(±0.021)0.823(±0.024)
平均值0.653(±0.049)0.74(±0.028)0.74(±0.028)0.732(±0.027)0.728(±0.032)0.712(±0.053)0.589(±0.077)0.757(±0.021)0.758(±0.029)
Table 23: Mean and standard deviation (after ± ) of target classification accuracy on UCI-HAR (Part 1) over 3 repetitions with different random initialization of model weights.
表 23:在 UCI-HAR(第 1 部分)上,针对 3 次不同随机初始化模型权重的重复实验的目标分类准确率的均值和标准差(在 ± 之后)。
TaskHoMM
HeuristicTheoretical error guarantees
SOTMVTMRTCRSORIWVDEVIWA (ours)TB
12160.674(±0.012)0.667(±0.036)0.667(±0.036)0.667(±0.036)0.688(±0.0)0.674(±0.012)0.674(±0.012)0.688(±0.0)0.715(±0.012)
2111.0(±0.0)1.0(±0.0)1.0(±0.0)1.0(±0.0)0.938(±0.031)0.99(±0.018)1.0(±0.0)1.0(±0.0)1.0(±0.0)
6230.854(±0.036)0.896(±0.0)0.896(±0.0)0.896(±0.0)0.889(±0.012)0.854(±0.036)0.854(±0.036)0.903(±0.012)0.938(±0.0)
7130.889(±0.052)0.91(±0.024)0.91(±0.024)0.91(±0.024)0.889(±0.052)0.889(±0.052)0.889(±0.052)0.91(±0.024)0.944(±0.024)
9180.493(±0.032)0.618(±0.105)0.618(±0.105)0.618(±0.105)0.514(±0.188)0.639(±0.043)0.583(±0.116)0.632(±0.079)0.715(±0.024)
Avg.0.782(±0.026)0.818(±0.033)0.818(±0.033)0.818(±0.033)0.783(±0.057)0.809(±0.032)0.8(±0.043)0.826(±0.023)0.862(±0.012)
任务HoMM
启发式理论误差保证
SOTMVTMRTCRSORIWVDEVIWA (我们的)TB
12160.674(±0.012)0.667(±0.036)0.667(±0.036)0.667(±0.036)0.688(±0.0)0.674(±0.012)0.674(±0.012)0.688(±0.0)0.715(±0.012)
2111.0(±0.0)1.0(±0.0)1.0(±0.0)1.0(±0.0)0.938(±0.031)0.99(±0.018)1.0(±0.0)1.0(±0.0)1.0(±0.0)
6230.854(±0.036)0.896(±0.0)0.896(±0.0)0.896(±0.0)0.889(±0.012)0.854(±0.036)0.854(±0.036)0.903(±0.012)0.938(±0.0)
7130.889(±0.052)0.91(±0.024)0.91(±0.024)0.91(±0.024)0.889(±0.052)0.889(±0.052)0.889(±0.052)0.91(±0.024)0.944(±0.024)
9180.493(±0.032)0.618(±0.105)0.618(±0.105)0.618(±0.105)0.514(±0.188)0.639(±0.043)0.583(±0.116)0.632(±0.079)0.715(±0.024)
平均值0.782(±0.026)0.818(±0.033)0.818(±0.033)0.818(±0.033)0.783(±0.057)0.809(±0.032)0.8(±0.043)0.826(±0.023)0.862(±0.012)
TaskAdvSKM
HeuristicTheoretical error guarantees
soTMVTMRTCRSORIWVDEVIWA (ours)TB
12160.59(±0.103)0.681(±0.012)0.681(±0.012)0.681(±0.012)0.708(±0.021)0.59(±0.103)0.59(±0.103)0.688(±0.0)0.722(±0.012)
2110.979(±0.018)1.0(±0.0)1.0(±0.0)1.0(±0.0)0.854(±0.065)0.896(±0.1)0.917(±0.118)1.0(±0.0)1.0(±0.0)
6230.826(±0.084)0.896(±0.0)0.896(±0.0)0.896(±0.0)0.889(±0.012)0.826(±0.084)0.826(±0.084)0.896(±0.0)0.896(±0.0)
7130.833(±0.036)0.91(±0.024)0.91(±0.024)0.917(±0.036)0.896(±0.036)0.833(±0.036)0.833(±0.036)0.903(±0.024)0.924(±0.012)
9180.389(±0.151)0.514(±0.079)0.514(±0.079)0.556(±0.064)0.493(±0.064)0.389(±0.151)0.389(±0.151)0.514(±0.024)0.597(±0.052)
Avg.0.724(±0.078)0.8(±0.023)0.8(±0.023)0.81(±0.022)0.768(±0.04)0.707(±0.095)0.711(±0.098)0.8(±0.01)0.828(±0.015)
任务AdvSKM
启发式理论误差保证
所以TMVTMRTCRSORIWVDEVIWA(我们的)TB
12160.59(±0.103)0.681(±0.012)0.681(±0.012)0.681(±0.012)0.708(±0.021)0.59(±0.103)0.59(±0.103)0.688(±0.0)0.722(±0.012)
2110.979(±0.018)1.0(±0.0)1.0(±0.0)1.0(±0.0)0.854(±0.065)0.896(±0.1)0.917(±0.118)1.0(±0.0)1.0(±0.0)
6230.826(±0.084)0.896(±0.0)0.896(±0.0)0.896(±0.0)0.889(±0.012)0.826(±0.084)0.826(±0.084)0.896(±0.0)0.896(±0.0)
7130.833(±0.036)0.91(±0.024)0.91(±0.024)0.917(±0.036)0.896(±0.036)0.833(±0.036)0.833(±0.036)0.903(±0.024)0.924(±0.012)
9180.389(±0.151)0.514(±0.079)0.514(±0.079)0.556(±0.064)0.493(±0.064)0.389(±0.151)0.389(±0.151)0.514(±0.024)0.597(±0.052)
平均值0.724(±0.078)0.8(±0.023)0.8(±0.023)0.81(±0.022)0.768(±0.04)0.707(±0.095)0.711(±0.098)0.8(±0.01)0.828(±0.015)
TaskDIRT
HeuristicTheoretical error guarantees
SOTMVTMRTCRSORIWVDEVIWA (ours)TB
12160.667(±0.075)0.778(±0.012)0.778(±0.012)0.75(±0.0)0.736(±0.012)0.667(±0.075)0.667(±0.075)0.764(±0.032)0.819(±0.064)
2111.0(±0.0)1.0(±0.0)1.0(±0.0)1.0(±0.0)0.854(±0.079)1.0(±0.0)1.0(±0.0)1.0(±0.0)1.0(±0.0)
6230.903(±0.084)0.938(±0.0)0.938(±0.0)0.938(±0.0)0.847(±0.06)0.847(±0.06)0.847(±0.06)0.938(±0.0)0.938(±0.0)
7130.833(±0.0)0.958(±0.0)0.958(±0.0)0.958(±0.0)0.924(±0.06)0.854(±0.036)0.861(±0.032)0.958(±0.0)0.958(±0.0)
9180.514(±0.024)0.861(±0.122)0.861(±0.122)0.806(±0.032)0.417(±0.075)0.667(±0.225)0.667(±0.225)0.84(±0.048)0.986(±0.012)
Avg.0.783(±0.037)0.907(±0.027)0.907(±0.027)0.89(±0.006)0.756(±0.057)0.807(±0.079)0.808(±0.078)0.9(±0.016)0.94(±0.015)
任务DIRT
启发式理论误差保证
SOTMVTMRTCRSORIWVDEVIWA (我们的)TB
12160.667(±0.075)0.778(±0.012)0.778(±0.012)0.75(±0.0)0.736(±0.012)0.667(±0.075)0.667(±0.075)0.764(±0.032)0.819(±0.064)
2111.0(±0.0)1.0(±0.0)1.0(±0.0)1.0(±0.0)0.854(±0.079)1.0(±0.0)1.0(±0.0)1.0(±0.0)1.0(±0.0)
6230.903(±0.084)0.938(±0.0)0.938(±0.0)0.938(±0.0)0.847(±0.06)0.847(±0.06)0.847(±0.06)0.938(±0.0)0.938(±0.0)
7130.833(±0.0)0.958(±0.0)0.958(±0.0)0.958(±0.0)0.924(±0.06)0.854(±0.036)0.861(±0.032)0.958(±0.0)0.958(±0.0)
9180.514(±0.024)0.861(±0.122)0.861(±0.122)0.806(±0.032)0.417(±0.075)0.667(±0.225)0.667(±0.225)0.84(±0.048)0.986(±0.012)
平均值0.783(±0.037)0.907(±0.027)0.907(±0.027)0.89(±0.006)0.756(±0.057)0.807(±0.079)0.808(±0.078)0.9(±0.016)0.94(±0.015)
TaskDDC
HeuristicTheoretical error guarantees
SOTMVTMRTCRSORIWVDEVIWA (ours)TR
12160.681(±0.024)0.681(±0.012)0.681(±0.012)0.674(±0.012)0.674(±0.012)0.681(±0.024)0.604(±0.108)0.681(±0.012)0.715(±0.073)
2111.0(±0.0)1.0(±0.0)1.0(±0.0)1.0(±0.0)0.74(±0.213)0.74(±0.213)0.802(±0.266)1.0(±0.0)1.0(±0.0)
6230.889(±0.012)0.896(±0.0)0.896(±0.0)0.896(±0.0)0.882(±0.012)0.889(±0.012)0.896(±0.021)0.896(±0.0)0.889(±0.0)
7130.868(±0.012)0.896(±0.036)0.896(±0.036)0.91(±0.032)0.903(±0.024)0.868(±0.012)0.868(±0.012)0.903(±0.024)0.917(±0.021)
9180.514(±0.043)0.562(±0.091)0.562(±0.091)0.569(±0.115)0.583(±0.165)0.444(±0.103)0.5(±0.127)0.542(±0.0910.59(±0.189)
Avg.0.79(±0.018)0.807(±0.028)0.807(±0.028)0.81(±0.032)0.756(±0.085)0.724(±0.073)0.734(±0.107)0.804(±0.025)0.822(±0.057)
任务DDC
启发式理论误差保证
SOTMVTMRTCRSORIWVDEVIWA(我们的)TR
12160.681(±0.024)0.681(±0.012)0.681(±0.012)0.674(±0.012)0.674(±0.012)0.681(±0.024)0.604(±0.108)0.681(±0.012)0.715(±0.073)
2111.0(±0.0)1.0(±0.0)1.0(±0.0)1.0(±0.0)0.74(±0.213)0.74(±0.213)0.802(±0.266)1.0(±0.0)1.0(±0.0)
6230.889(±0.012)0.896(±0.0)0.896(±0.0)0.896(±0.0)0.882(±0.012)0.889(±0.012)0.896(±0.021)0.896(±0.0)0.889(±0.0)
7130.868(±0.012)0.896(±0.036)0.896(±0.036)0.91(±0.032)0.903(±0.024)0.868(±0.012)0.868(±0.012)0.903(±0.024)0.917(±0.021)
9180.514(±0.043)0.562(±0.091)0.562(±0.091)0.569(±0.115)0.583(±0.165)0.444(±0.103)0.5(±0.127)0.542(±0.0910.59(±0.189)
平均值0.79(±0.018)0.807(±0.028)0.807(±0.028)0.81(±0.032)0.756(±0.085)0.724(±0.073)0.734(±0.107)0.804(±0.025)0.822(±0.057)
TaskCMD
HeuristicTheoretical error guarantees
soTMVTMRTCRSORTWVDEVIWA (ours)TB
12160.694(±0.012)0.729(±0.0)0.729(±0.0)0.722(±0.012)0.688(±0.0)0.694(±0.012)0.694(±0.012)0.729(±0.0)0.75(±0.036)
2111.0(±0.0)1.0(±0.0)1.0(±0.0)1.0(±0.0)0.771(±0.036)0.812(±0.272)0.812(±0.272)1.0 (±0.0)1.0(±0.0)
6230.896(±0.042)0.938(±0.0)0.938(±0.0)0.931(±0.012)0.889(±0.012)0.903(±0.032)0.896(±0.042)0.938(±0.0)0.938(±0.0)
7130.826(±0.073)0.938(±0.0)0.938(±0.0)0.938(±0.0)0.91(±0.024)0.826(±0.073)0.826(±0.073)0.938(±0.0)0.944(±0.012)
9180.521(±0.095)0.639(±0.043)0.639(±0.043)0.604(±0.055)0.396(±0.11)0.785(±0.146)0.833(±0.163)0.604(±0.075)0.882(±0.079)
Avg.0.788(±0.044)0.849(±0.009)0.849(±0.009)0.839(±0.016)0.731(±0.036)0.804(±0.107)0.812(±0.112)0.842(±0.015)0.903(±0.025)
任务命令
启发式理论误差保证
所以TMVTMRTCRSORTWVDEVIWA(我们的)TB
12160.694(±0.012)0.729(±0.0)0.729(±0.0)0.722(±0.012)0.688(±0.0)0.694(±0.012)0.694(±0.012)0.729(±0.0)0.75(±0.036)
2111.0(±0.0)1.0(±0.0)1.0(±0.0)1.0(±0.0)0.771(±0.036)0.812(±0.272)0.812(±0.272)1.0 (±0.0)1.0(±0.0)
6230.896(±0.042)0.938(±0.0)0.938(±0.0)0.931(±0.012)0.889(±0.012)0.903(±0.032)0.896(±0.042)0.938(±0.0)0.938(±0.0)
7130.826(±0.073)0.938(±0.0)0.938(±0.0)0.938(±0.0)0.91(±0.024)0.826(±0.073)0.826(±0.073)0.938(±0.0)0.944(±0.012)
9180.521(±0.095)0.639(±0.043)0.639(±0.043)0.604(±0.055)0.396(±0.11)0.785(±0.146)0.833(±0.163)0.604(±0.075)0.882(±0.079)
平均值0.788(±0.044)0.849(±0.009)0.849(±0.009)0.839(±0.016)0.731(±0.036)0.804(±0.107)0.812(±0.112)0.842(±0.015)0.903(±0.025)
TaskMMDA
HeuristicTheoretical error guarantees
SOTMVTMRTCRSORIWVDEVIWA (ours)TB
12160.674(±0.012)0.694(±0.012)0.694(±0.012)0.688(±0.0)0.681(±0.012)0.674(±0.012)0.674(±0.012)0.688(±0.0)0.757(±0.043)
2111.0(±0.0)1.0(±0.0)1.0(±0.0)1.0(±0.0)0.844(±0.125)0.927(±0.065)0.927(±0.065)1.0(±0.0)1.0(±0.0)
6230.847(±0.032)0.896(±0.0)0.896(±0.0)0.896(±0.0)0.847(±0.067)0.847(±0.032)0.847(±0.032)0.896(±0.0)0.91(±0.024)
7130.882(±0.043)0.903(±0.012)0.903(±0.012)0.917(±0.021)0.924(±0.024)0.882(±0.043)0.882(±0.043)0.931(±0.024)0.944(±0.012)
9180.521(±0.072)0.569(±0.139)0.569(±0.139)0.5(±0.104)0.5(±0.095)0.535(±0.064)0.507(±0.024)0.521(±0.091)0.688(±0.075)
Avg.0.785(±0.032)0.812(±0.033)0.812(±0.033)0.8(±0.025)0.759(±0.065)0.773(±0.043)0.767(±0.035)0.807(±0.023)0.86(±0.031)
任务MMDA
启发式理论误差保证
SOTMVTMRTCRSORIWVDEVIWA(我们的)TB
12160.674(±0.012)0.694(±0.012)0.694(±0.012)0.688(±0.0)0.681(±0.012)0.674(±0.012)0.674(±0.012)0.688(±0.0)0.757(±0.043)
2111.0(±0.0)1.0(±0.0)1.0(±0.0)1.0(±0.0)0.844(±0.125)0.927(±0.065)0.927(±0.065)1.0(±0.0)1.0(±0.0)
6230.847(±0.032)0.896(±0.0)0.896(±0.0)0.896(±0.0)0.847(±0.067)0.847(±0.032)0.847(±0.032)0.896(±0.0)0.91(±0.024)
7130.882(±0.043)0.903(±0.012)0.903(±0.012)0.917(±0.021)0.924(±0.024)0.882(±0.043)0.882(±0.043)0.931(±0.024)0.944(±0.012)
9180.521(±0.072)0.569(±0.139)0.569(±0.139)0.5(±0.104)0.5(±0.095)0.535(±0.064)0.507(±0.024)0.521(±0.091)0.688(±0.075)
平均值0.785(±0.032)0.812(±0.033)0.812(±0.033)0.8(±0.025)0.759(±0.065)0.773(±0.043)0.767(±0.035)0.807(±0.023)0.86(±0.031)
Table 24: Mean and standard deviation (after ± ) of target classification accuracy on UCI-HAR (Part 2) over 3 repetitions with different random initialization of model weights.
表 24:在 UCI-HAR(第 2 部分)上,针对 3 次不同随机初始化模型权重的重复实验的目标分类准确率的均值和标准差(在 ± 之后)。
TaskCoDATS
HeuristicTheoretical error guarantees
SOTMVTMRTCRSORTWVDEVIWA (ours)TB
12160.681(±0.024)0.701(±0.024)0.701(±0.024)0.688(±0.0)0.694(±0.024)0.681(±0.024)0.694(±0.024)0.694(±0.012)0.708(±0.021)
2111.0(±0.0)1.0(±0.0)1.0(±0.0)1.0(±0.0)0.865(±0.11)0.865(±0.1)0.865(±0.1)1.0(±0.0)1.0(±0.0)
6230.84(±0.032)0.889(±0.032)0.889(±0.032)0.889(±0.032)0.882(±0.064)0.84(±0.048)0.896(±0.083)0.924(±0.012)0.951(±0.043)
7130.75(±0.217)0.944(±0.0120.944(±0.012)0.944(±0.012)0.917(±0.036)0.854(±0.0360.75(±0.217)0.938(±0.0)0.951(±0.012)
9180.528(±0.032)0.625(±0.062)0.625(±0.062)0.639(±0.043)0.569(±0.305)0.764(±0.12)0.764(±0.12)0.674(±0.115)0.826(±0.012)
Avg.0.76(±0.061)0.832(±0.026)0.832(±0.026)0.832(±0.017)0.785(±0.108)0.801(±0.066)0.794(±0.109)0.846(±0.028)0.888(±0.018)
任务CoDATS
启发式理论误差保证
SOTMVTMRTCRSORTWVDEVIWA(我们的)TB
12160.681(±0.024)0.701(±0.024)0.701(±0.024)0.688(±0.0)0.694(±0.024)0.681(±0.024)0.694(±0.024)0.694(±0.012)0.708(±0.021)
2111.0(±0.0)1.0(±0.0)1.0(±0.0)1.0(±0.0)0.865(±0.11)0.865(±0.1)0.865(±0.1)1.0(±0.0)1.0(±0.0)
6230.84(±0.032)0.889(±0.032)0.889(±0.032)0.889(±0.032)0.882(±0.064)0.84(±0.048)0.896(±0.083)0.924(±0.012)0.951(±0.043)
7130.75(±0.217)0.944(±0.0120.944(±0.012)0.944(±0.012)0.917(±0.036)0.854(±0.0360.75(±0.217)0.938(±0.0)0.951(±0.012)
9180.528(±0.032)0.625(±0.062)0.625(±0.062)0.639(±0.043)0.569(±0.305)0.764(±0.12)0.764(±0.12)0.674(±0.115)0.826(±0.012)
平均值0.76(±0.061)0.832(±0.026)0.832(±0.026)0.832(±0.017)0.785(±0.108)0.801(±0.066)0.794(±0.109)0.846(±0.028)0.888(±0.018)
TaskDeep-Coral
HeuristicTheoretical error guarantees
SOTMVTMRTCRSORTWVDEVIWA (ours)TB
12160.681(±0.012)0.681(±0.012)0.681(±0.012)0.674(±0.012)0.681(±0.012)0.681(±0.012)0.639(±0.084)0.688(±0.0)0.688(±0.0)
2110.969(±0.054)1.0(±0.0)1.0(±0.0)1.0(±0.0)0.875(±0.094)0.875(±0.083)0.885(±0.095)1.0(±0.0)1.0(±0.0)
6230.826(±0.084)0.882(±0.012)0.882(±0.012)0.889(±0.012)0.896(±0.0)0.826(±0.084)0.833(±0.091)0.896(±0.0)0.903(±0.043)
7130.917(±0.036)0.91(±0.0320.91(±0.032)0.91(±0.032)0.896(±0.021)0.917(±0.036)0.917(±0.036)0.903(±0.024)0.931(±0.012)
9180.556(±0.087)0.528(±0.043)0.528(±0.043)0.569(±0.115)0.507(±0.272)0.542(±0.144)0.59(±0.032)0.556(±0.064)0.59(±0.221)
Avg.0.79(±0.055)0.8(±0.02)0.8(±0.02)0.808(±0.034)0.771(±0.08)0.768(±0.072)0.773(±0.068)0.808(±0.018)0.822(±0.055)
任务深度珊瑚
启发式理论误差保证
SOTMVTMRTCRSORTWVDEVIWA (我们的)TB
12160.681(±0.012)0.681(±0.012)0.681(±0.012)0.674(±0.012)0.681(±0.012)0.681(±0.012)0.639(±0.084)0.688(±0.0)0.688(±0.0)
2110.969(±0.054)1.0(±0.0)1.0(±0.0)1.0(±0.0)0.875(±0.094)0.875(±0.083)0.885(±0.095)1.0(±0.0)1.0(±0.0)
6230.826(±0.084)0.882(±0.012)0.882(±0.012)0.889(±0.012)0.896(±0.0)0.826(±0.084)0.833(±0.091)0.896(±0.0)0.903(±0.043)
7130.917(±0.036)0.91(±0.032)0.91(±0.032)0.91(±0.032)0.896(±0.021)0.917(±0.036)0.917(±0.036)0.903(±0.024)0.931(±0.012)
9180.556(±0.087)0.528(±0.043)0.528(±0.043)0.569(±0.115)0.507(±0.272)0.542(±0.144)0.59(±0.032)0.556(±0.064)0.59(±0.221)
平均值0.79(±0.055)0.8(±0.02)0.8(±0.02)0.808(±0.034)0.771(±0.08)0.768(±0.072)0.773(±0.068)0.808(±0.018)0.822(±0.055)
TaskCDAN
HeuristicTheoretical error guarantees
SOTMVTMRTCRSORIWVDEVIWA (ours)TB
12160.674(±0.043)0.722(±0.012)0.722(±0.012)0.722(±0.012)0.694(±0.012)0.674(±0.043)0.646(±0.055)0.715(±0.024)0.729(±0.036)
2110.99(±0.018)1.0(±0.0)1.0(±0.0)1.0(±0.0)0.865(±0.065)0.792(±0.28)0.802(±0.29)1.0(±0.0)1.0(±0.0)
6230.854(±0.042)0.924(±0.024)0.924(±0.024)0.91(±0.024)0.91(±0.024)0.847(±0.032)0.708(±0.295)0.924(±0.024)0.924(±0.024)
7130.826(±0.139)0.951(±0.012)0.951(±0.012)0.951(±0.012)0.944(±0.012)0.896(±0.021)0.812(±0.108)0.951(±0.012)0.958(±0.0)
9180.438(±0.055)0.618(±0.064)0.618(±0.064)0.618(±0.064)0.597(±0.098)0.694(±0.126)0.465(±0.087)0.639(±0.048)0.729(±0.083)
Avg.0.756(±0.059)0.843(±0.022)0.843(±0.022)0.84(±0.022)0.802(±0.042)0.781(±0.1)0.687(±0.167)0.846(±0.022)0.868(±0.029)
任务CDAN
启发式理论误差保证
SOTMVTMRTCRSORIWVDEVIWA(我们的)TB
12160.674(±0.043)0.722(±0.012)0.722(±0.012)0.722(±0.012)0.694(±0.012)0.674(±0.043)0.646(±0.055)0.715(±0.024)0.729(±0.036)
2110.99(±0.018)1.0(±0.0)1.0(±0.0)1.0(±0.0)0.865(±0.065)0.792(±0.28)0.802(±0.29)1.0(±0.0)1.0(±0.0)
6230.854(±0.042)0.924(±0.024)0.924(±0.024)0.91(±0.024)0.91(±0.024)0.847(±0.032)0.708(±0.295)0.924(±0.024)0.924(±0.024)
7130.826(±0.139)0.951(±0.012)0.951(±0.012)0.951(±0.012)0.944(±0.012)0.896(±0.021)0.812(±0.108)0.951(±0.012)0.958(±0.0)
9180.438(±0.055)0.618(±0.064)0.618(±0.064)0.618(±0.064)0.597(±0.098)0.694(±0.126)0.465(±0.087)0.639(±0.048)0.729(±0.083)
平均0.756(±0.059)0.843(±0.022)0.843(±0.022)0.84(±0.022)0.802(±0.042)0.781(±0.1)0.687(±0.167)0.846(±0.022)0.868(±0.029)
TaskDANN
HeuristicTheoretical error guarantees
soTMVTMRTCRSORIWVDEVIWA (ours)TB
12160.688(±0.021)0.701(±0.024)0.701(±0.024)0.701(±0.024)0.715(±0.012)0.701(±0.012)0.701(±0.012)0.729(±0.0)0.722(±0.0)
2111.0(±0.0)1.0(±0.0)1.0(±0.0)1.0(±0.0)0.729(±0.141)0.76(±0.148)0.802(±0.203)1.0(±0.0)1.0(±0.0)
6230.882(±0.024)0.917(±0.021)0.917(±0.021)0.931(±0.012)0.889(±0.032)0.868(±0.024)0.868(±0.024)0.931(±0.012)0.958(±0.036)
7130.806(±0.127)0.944(±0.012)0.944(±0.012)0.944(±0.012)0.938(±0.021)0.903(±0.024)0.944(±0.0480.944(±0.012)0.965(±0.012)
9180.403(±0.12)0.715(±0.012)0.715(±0.012)0.701(±0.067)0.729(±0.127)0.583(±0.11)0.583(±0.11)0.639(±0.103)0.674(±0.127)
Avg.0.756(±0.058)0.856(±0.014)0.856(±0.014)0.856(±0.023)0.8(±0.066)0.763(±0.064)0.78(±0.08)0.849(±0.025)0.864(±0.035)
任务DANN
启发式理论误差保证
所以TMVTMRTCRSORIWVDEVIWA (我们的)TB
12160.688(±0.021)0.701(±0.024)0.701(±0.024)0.701(±0.024)0.715(±0.012)0.701(±0.012)0.701(±0.012)0.729(±0.0)0.722(±0.0)
2111.0(±0.0)1.0(±0.0)1.0(±0.0)1.0(±0.0)0.729(±0.141)0.76(±0.148)0.802(±0.203)1.0(±0.0)1.0(±0.0)
6230.882(±0.024)0.917(±0.021)0.917(±0.021)0.931(±0.012)0.889(±0.032)0.868(±0.024)0.868(±0.024)0.931(±0.012)0.958(±0.036)
7130.806(±0.127)0.944(±0.012)0.944(±0.012)0.944(±0.012)0.938(±0.021)0.903(±0.024)0.944(±0.0480.944(±0.012)0.965(±0.012)
9180.403(±0.12)0.715(±0.012)0.715(±0.012)0.701(±0.067)0.729(±0.127)0.583(±0.11)0.583(±0.11)0.639(±0.103)0.674(±0.127)
平均值0.756(±0.058)0.856(±0.014)0.856(±0.014)0.856(±0.023)0.8(±0.066)0.763(±0.064)0.78(±0.08)0.849(±0.025)0.864(±0.035)
TaskDSAN
HeuristicTheoretical error guarantees
soTMVTMRTCRSORIWVDEVIWA (ours)TB
12160.681(±0.012)0.688(±0.021)0.688(±0.021)0.715(±0.024)0.694(±0.012)0.681(±0.012)0.556(±0.229)0.715(±0.024)0.736(±0.012)
2111.0(±0.0)1.0(±0.0)1.0(±0.0)1.0(±0.0)0.969(±0.0)1.0(±0.0)1.0(±0.0)1.0(±0.0)1.0(±0.0)
6230.764(±0.094)0.938(±0.0)0.938(±0.0)0.917(±0.021)0.847(±0.052)0.785(±0.103)0.785(±0.103)0.931(±0.012)0.938(±0.0)
7130.854(±0.036)0.951(±0.012)0.951(±0.012)0.951(±0.012)0.854(±0.0360.854(±0.036)0.854(±0.036)0.951(±0.012)0.958(±0.0)
9180.514(±0.032)0.639(±0.079)0.639(±0.079)0.688(±0.0910.382(±0.012)0.556(±0.067)0.528(±0.067)0.694(±0.032)0.743(±0.087)
Avg.0.762(±0.035)0.843(±0.022)0.843(±0.022)0.854(±0.03)0.749(±0.023)0.775(±0.044)0.744(±0.087)0.858(±0.016)0.875(±0.02)
任务DSAN
启发式理论误差保证
所以TMVTMRTCRSORIWVDEVIWA(我们的)TB
12160.681(±0.012)0.688(±0.021)0.688(±0.021)0.715(±0.024)0.694(±0.012)0.681(±0.012)0.556(±0.229)0.715(±0.024)0.736(±0.012)
2111.0(±0.0)1.0(±0.0)1.0(±0.0)1.0(±0.0)0.969(±0.0)1.0(±0.0)1.0(±0.0)1.0(±0.0)1.0(±0.0)
6230.764(±0.094)0.938(±0.0)0.938(±0.0)0.917(±0.021)0.847(±0.052)0.785(±0.103)0.785(±0.103)0.931(±0.012)0.938(±0.0)
7130.854(±0.036)0.951(±0.012)0.951(±0.012)0.951(±0.012)0.854(±0.0360.854(±0.036)0.854(±0.036)0.951(±0.012)0.958(±0.0)
9180.514(±0.032)0.639(±0.079)0.639(±0.079)0.688(±0.0910.382(±0.012)0.556(±0.067)0.528(±0.067)0.694(±0.032)0.743(±0.087)
平均值0.762(±0.035)0.843(±0.022)0.843(±0.022)0.854(±0.03)0.749(±0.023)0.775(±0.044)0.744(±0.087)0.858(±0.016)0.875(±0.02)
Table 25: Mean and standard deviation (after ± ) of target classification accuracy on HHAR (Part 1) over 3 repetitions with different random initialization of model weights.
表 25:在 HHAR(第 1 部分)上,针对 3 次不同随机初始化模型权重的重复实验的目标分类准确率的均值和标准差(在 ± 之后)。
TaskHoMM
HeuristicTheoretical error guarantees
SOTMVTMRTCRSORIWVDEVIWA (ours)TB
060.719(±0.041)0.737(±0.019)0.737(±0.019)0.735(±0.024)0.635(±0.122)0.704(±0.011)0.703(±0.013)0.732(±0.01)0.733(±0.019)
160.788(±0.11)0.871(±0.011)0.871(±0.011)0.869(±0.013)0.788(±0.163)0.749(±0.06)0.785(±0.106)0.879(±0.011)0.914(±0.01)
270.528(±0.103)0.457(±0.007)0.457(±0.007)0.461(±0.007)0.469(±0.0510.497(±0.105)0.516(±0.087)0.455(±0.015)0.546(±0.071)
380.797(±0.007)0.818(±0.0020.818(±0.002)0.816(±0.007)0.812(±0.014)0.805(±0.01)0.805(±0.01)0.818(±0.005)0.831i˙±0.022˙
450.8611˙±0.0222˙0.914(±0.007)0.914(±0.007)0.911/±0.0050.798(±0.164)0.844(±0.016)0.855(±0.01)0.911(±0.0050.94(±0.04)
Avg.0.739(±0.057)0.759(±0.009)0.759(±0.009)0.759(±0.011)0.7(±0.103)0.72(±0.041)0.733(±0.045)0.759(±0.009)0.793(±0.032)
任务HoMM
启发式理论误差保证
SOTMVTMRTCRSORIWVDEVIWA(我们的)TB
060.719(±0.041)0.737(±0.019)0.737(±0.019)0.735(±0.024)0.635(±0.122)0.704(±0.011)0.703(±0.013)0.732(±0.01)0.733(±0.019)
160.788(±0.11)0.871(±0.011)0.871(±0.011)0.869(±0.013)0.788(±0.163)0.749(±0.06)0.785(±0.106)0.879(±0.011)0.914(±0.01)
270.528(±0.103)0.457(±0.007)0.457(±0.007)0.461(±0.007)0.469(±0.0510.497(±0.105)0.516(±0.087)0.455(±0.015)0.546(±0.071)
380.797(±0.007)0.818(±0.0020.818(±0.002)0.816(±0.007)0.812(±0.014)0.805(±0.01)0.805(±0.01)0.818(±0.005)0.831i˙±0.022˙
450.8611˙±0.0222˙0.914(±0.007)0.914(±0.007)0.911/±0.0050.798(±0.164)0.844(±0.016)0.855(±0.01)0.911(±0.0050.94(±0.04)
平均值0.739(±0.057)0.759(±0.009)0.759(±0.009)0.759(±0.011)0.7(±0.103)0.72(±0.041)0.733(±0.045)0.759(±0.009)0.793(±0.032)
TaskAdvSKM
HeuristicTheoretical error guarantees
soTMVTMRTCRSORIWVDEVIWA (ours)TB
060.661(±0.035)0.722(±0.009)0.722(±0.009)0.725(±0.007)0.629(±0.076)0.699(±0.013)0.699(±0.013)0.718(±0.009)0.731(±0.013)
160.821(±0.046)0.849(±0.002)0.849(±0.002)0.851(±0.006)0.604(±0.045)0.806(±0.026)0.806(±0.026)0.857(±0.0060.838(±0.018)
270.455(±0.07)0.473(±0.031)0.473(±0.031)0.484(±0.03)0.531(±0.0620.497(±0.067)0.497(±0.0670.49(±0.014)0.574(±0.112)
380.79(±0.011)0.799(±0.008)0.799(±0.008)0.803(±0.005)0.805(±0.01)0.799(±0.0050.788(±0.023)0.81(±0.002)0.818(±0.012)
450.862(±0.055)0.866(±0.013)0.866(±0.013)0.875(±0.008)0.809(±0.094)0.85(±0.025)0.85(±0.025)0.884(±0.005)0.895(±0.014)
Avg.0.718(±0.044)0.742(±0.013)0.742(±0.013)0.748(±0.011)0.676(±0.058)0.73(±0.027)0.728(±0.031)0.752(±0.007)0.771(±0.034)
任务AdvSKM
启发式理论误差保证
所以TMVTMRTCRSORIWVDEVIWA(我们的)TB
060.661(±0.035)0.722(±0.009)0.722(±0.009)0.725(±0.007)0.629(±0.076)0.699(±0.013)0.699(±0.013)0.718(±0.009)0.731(±0.013)
160.821(±0.046)0.849(±0.002)0.849(±0.002)0.851(±0.006)0.604(±0.045)0.806(±0.026)0.806(±0.026)0.857(±0.0060.838(±0.018)
270.455(±0.07)0.473(±0.031)0.473(±0.031)0.484(±0.03)0.531(±0.0620.497(±0.067)0.497(±0.0670.49(±0.014)0.574(±0.112)
380.79(±0.011)0.799(±0.008)0.799(±0.008)0.803(±0.005)0.805(±0.01)0.799(±0.0050.788(±0.023)0.81(±0.002)0.818(±0.012)
450.862(±0.055)0.866(±0.013)0.866(±0.013)0.875(±0.008)0.809(±0.094)0.85(±0.025)0.85(±0.025)0.884(±0.005)0.895(±0.014)
平均0.718(±0.044)0.742(±0.013)0.742(±0.013)0.748(±0.011)0.676(±0.058)0.73(±0.027)0.728(±0.031)0.752(±0.007)0.771(±0.034)
TaskDIRT
HeuristicTheoretical error guarantees
soTMVTMRTCRSOR1WVDEVIWA (ours)TB
060.708(±0.011)0.571(±0.079)0.571(±0.079)0.633(±0.137)0.629(±0.085)0.708(±0.011)0.708(±0.011)0.779(±0.011)0.739(±0.011)
160.756(±0.058)0.938(±0.0040.938(±0.004)0.938(±0.004)0.904(±0.058)0.814(±0.056)0.756(±0.058)0.938(±0.0)0.942(±0.004)
270.531(±0.057)0.622(±0.091)0.622(±0.091)0.621(±0.09)0.658(±0.03)0.548(±0.049)0.548(±0.0490.53(±0.025)0.688(±0.004)
380.807(±0.008)0.846(±0.022)0.846(±0.022)0.837(±0.015)0.857(±0.087)0.807(±0.008)0.807(±0.008)0.848(±0.007)0.911(±0.07)
450.839(±0.016)0.984(±0.0040.984(±0.004)0.984(±0.004)0.93(±0.061)0.839(±0.016)0.878(±0.074)0.984(±0.004)0.984(±0.002)
Avg.0.728(±0.03)0.792(±0.04)0.792(±0.04)0.803(±0.05)0.796(±0.064)0.743(±0.028)0.739(±0.04)0.816(±0.009)0.853(±0.018)
任务DIRT
启发式理论误差保证
所以TMVTMRTCRSOR1WVDEVIWA(我们的)TB
060.708(±0.011)0.571(±0.079)0.571(±0.079)0.633(±0.137)0.629(±0.085)0.708(±0.011)0.708(±0.011)0.779(±0.011)0.739(±0.011)
160.756(±0.058)0.938(±0.0040.938(±0.004)0.938(±0.004)0.904(±0.058)0.814(±0.056)0.756(±0.058)0.938(±0.0)0.942(±0.004)
270.531(±0.057)0.622(±0.091)0.622(±0.091)0.621(±0.09)0.658(±0.03)0.548(±0.049)0.548(±0.049)0.53(±0.025)0.688(±0.004)
380.807(±0.008)0.846(±0.022)0.846(±0.022)0.837(±0.015)0.857(±0.087)0.807(±0.008)0.807(±0.008)0.848(±0.007)0.911(±0.07)
450.839(±0.016)0.984(±0.0040.984(±0.004)0.984(±0.004)0.93(±0.061)0.839(±0.016)0.878(±0.074)0.984(±0.004)0.984(±0.002)
平均值0.728(±0.03)0.792(±0.04)0.792(±0.04)0.803(±0.05)0.796(±0.064)0.743(±0.028)0.739(±0.04)0.816(±0.009)0.853(±0.018)
TaskDDC
HeuristicTheoretical error guarantees
soTMVTMRTCRSORIWVDEVIWA (ours)TB
060.575(±0.083)0.693(±0.017)0.693(±0.017)0.672(±0.009)0.632(±0.038)0.646(±0.015)0.6(±0.084)0.697(±0.013)0.651(±0.013)
160.875(±0.014)0.888(±0.007)0.888(±0.007)0.894(±0.013)0.886(±0.019)0.856(±0.067)0.856(±0.067)0.882(±0.017)0.899(±0.016)
270.487(±0.016)0.457(±0.03)0.457(±0.03)0.455(±0.029)0.497(±0.0280.438(±0.054)0.438(±0.054)0.439(±0.041)0.533(±0.107)
380.815(±0.019)0.805(±0.014)0.805(±0.014)0.8071 ±0.0160.797(±0.022)0.827(˙±0.025˙0.831(±0.02)0.818(±0.008)0.822(±0.02)
450.831(±0.048)0.908(±0.011)0.908(±0.011)0.911(±0.01)0.772(±0.085)0.792(±0.02)0.801(±0.012)0.905(±0.008)0.888(±0.01)
Avg.0.716(±0.036)0.75(±0.016)0.75(±0.016)0.748(±0.015)0.717(±0.038)0.711(±0.036)0.705(±0.047)0.748(±0.017)0.758(±0.033)
任务DDC
启发式理论误差保证
所以TMVTMRTCRSORIWVDEVIWA(我们的)TB
060.575(±0.083)0.693(±0.017)0.693(±0.017)0.672(±0.009)0.632(±0.038)0.646(±0.015)0.6(±0.084)0.697(±0.013)0.651(±0.013)
160.875(±0.014)0.888(±0.007)0.888(±0.007)0.894(±0.013)0.886(±0.019)0.856(±0.067)0.856(±0.067)0.882(±0.017)0.899(±0.016)
270.487(±0.016)0.457(±0.03)0.457(±0.03)0.455(±0.029)0.497(±0.028)0.438(±0.054)0.438(±0.054)0.439(±0.041)0.533(±0.107)
380.815(±0.019)0.805(±0.014)0.805(±0.014)0.8071 ±0.0160.797(±0.022)0.827(˙±0.025˙0.831(±0.02)0.818(±0.008)0.822(±0.02)
450.831(±0.048)0.908(±0.011)0.908(±0.011)0.911(±0.01)0.772(±0.085)0.792(±0.02)0.801(±0.012)0.905(±0.008)0.888(±0.01)
平均值0.716(±0.036)0.75(±0.016)0.75(±0.016)0.748(±0.015)0.717(±0.038)0.711(±0.036)0.705(±0.047)0.748(±0.017)0.758(±0.033)
TaskCMD
HeuristicTheoretical error guarantees
SOTMVTMRTCRSORIWVDEVIWA (ours)TB
060.703(±0.04)0.643(±0.046)0.643(±0.046)0.679(±0.054)0.672(±0.103)0.743(±0.021)0.743(±0.021)0.693(±0.012)0.724(±0.021)
160.861(±0.035)0.915(±0.0090.915(±0.009)0.912(±0.0)0.826(±0.103)0.899(±0.03)0.843(±0.156)0.907(±0.006)0.925(±0.004)
270.573(±0.018)0.494(±0.011)0.494(±0.011)0.499(±0.023)0.51(±0.066)0.577(±0.011)0.557(±0.039)0.482(±0.008)0.603(±0.016)
380.799(±0.016)0.816(±0.0)0.816(±0.0)0.816(±0.0)0.797(±0.004)0.799(±0.016)0.738(±0.109)0.811(±0.011)0.822(±0.01)
450.806(±0.024)0.952(±0.016)0.952(±0.016)0.93(±0.007)0.879(±0.054)0.854(±0.063)0.332(±0.051)0.939(±0.002)0.961(±0.024)
Avg.0.748(±0.026)0.764(±0.016)0.764(±0.016)0.767(±0.017)0.737(±0.066)0.775(±0.028)0.643(±0.075)0.766(±0.008)0.807(±0.015)
任务CMD
启发式理论误差保证
SOTMVTMRTCRSORIWVDEVIWA(我们的)TB
060.703(±0.04)0.643(±0.046)0.643(±0.046)0.679(±0.054)0.672(±0.103)0.743(±0.021)0.743(±0.021)0.693(±0.012)0.724(±0.021)
160.861(±0.035)0.915(±0.0090.915(±0.009)0.912(±0.0)0.826(±0.103)0.899(±0.03)0.843(±0.156)0.907(±0.006)0.925(±0.004)
270.573(±0.018)0.494(±0.011)0.494(±0.011)0.499(±0.023)0.51(±0.066)0.577(±0.011)0.557(±0.039)0.482(±0.008)0.603(±0.016)
380.799(±0.016)0.816(±0.0)0.816(±0.0)0.816(±0.0)0.797(±0.004)0.799(±0.016)0.738(±0.109)0.811(±0.011)0.822(±0.01)
450.806(±0.024)0.952(±0.016)0.952(±0.016)0.93(±0.007)0.879(±0.054)0.854(±0.063)0.332(±0.051)0.939(±0.002)0.961(±0.024)
平均0.748(±0.026)0.764(±0.016)0.764(±0.016)0.767(±0.017)0.737(±0.066)0.775(±0.028)0.643(±0.075)0.766(±0.008)0.807(±0.015)
TaskMMDA
HeuristicTheoretical error guarantees
SOTMVTMRTCRSORIWVDEVIWA (ours)TB
060.706(±0.042)0.737(±0.021)0.737(±0.021)0.737(±0.018)0.704(±0.037)0.699(±0.055)0.725(±0.018)0.746(±0.008)0.732(±0.008)
160.792(±0.098)0.91(±0.002)0.91(±0.002)0.91(±0.002)0.742(±0.147)0.732(±0.009)0.722(±0.009)0.897(±0.002)0.914(±0.028)
270.552(±0.073)0.481(±0.003)0.481(±0.003)0.482(±0.004)0.515(±0.0580.496(±0.027)0.543(±0.0550.488(±0.01)0.552(±0.073)
380.79(±0.016)0.862(±0.0260.862(±0.026)0.841(±0.01)0.794(±0.022)0.802(±0.018)0.802(±0.018)0.839(±0.012)0.932(±0.005)
450.852(±0.036)0.917(±0.016)0.917(±0.016)0.931(±0.002)0.733(±0.061)0.865(±0.055)0.865(±0.055)0.928(±0.0060.947(±0.022)
Avg.0.738(±0.053)0.781(±0.014)0.781(±0.014)0.78(±0.007)0.698(±0.065)0.719(±0.033)0.731(±0.031)0.78(±0.008)0.815(±0.027)
任务MMDA
启发式理论误差保证
SOTMVTMRTCRSORIWVDEVIWA (我们的)TB
060.706(±0.042)0.737(±0.021)0.737(±0.021)0.737(±0.018)0.704(±0.037)0.699(±0.055)0.725(±0.018)0.746(±0.008)0.732(±0.008)
160.792(±0.098)0.91(±0.002)0.91(±0.002)0.91(±0.002)0.742(±0.147)0.732(±0.009)0.722(±0.009)0.897(±0.002)0.914(±0.028)
270.552(±0.073)0.481(±0.003)0.481(±0.003)0.482(±0.004)0.515(±0.0580.496(±0.027)0.543(±0.0550.488(±0.01)0.552(±0.073)
380.79(±0.016)0.862(±0.026)0.862(±0.026)0.841(±0.01)0.794(±0.022)0.802(±0.018)0.802(±0.018)0.839(±0.012)0.932(±0.005)
450.852(±0.036)0.917(±0.016)0.917(±0.016)0.931(±0.002)0.733(±0.061)0.865(±0.055)0.865(±0.055)0.928(±0.0060.947(±0.022)
平均0.738(±0.053)0.781(±0.014)0.781(±0.014)0.78(±0.007)0.698(±0.065)0.719(±0.033)0.731(±0.031)0.78(±0.008)0.815(±0.027)
Table 26: Mean and standard deviation (after ± ) of target classification accuracy on HHAR (Part 2) over 3 repetitions with different random initialization of model weights.
表 26:在 HHAR(第 2 部分)上,针对 3 次不同随机初始化模型权重的重复实验的目标分类准确率的均值和标准差(在 ± 之后)。
TaskCoDATS
HeuristicTheoretical error guarantees
soTMVTMRTCRSORIWVDEVIWA (ours)TB
060.64(±0.096)0.497(±0.005)0.497(±0.005)0.499(±0.006)0.582(±0.122)0.689(±0.069)0.606(±0.106)0.718(±0.017)0.735(±0.01)
160.775(±0.091)0.946(±0.0)0.946(±0.0)0.946(±0.0)0.683(±0.076)0.896(±0.069)0.896(±0.069)0.939(±0.006)0.947(±0.009)
270.527(±0.076)0.473(±0.012)0.473(±0.012)0.475(±0.01)0.534(±0.052)0.472(±0.019)0.555(±0.069)0.472(±0.005)0.558(±0.041)
380.783(±0.015)0.971(±0.0180.971(±0.018)0.969(±0.018)0.97(±0.025)0.789(±0.01)0.789(±0.01)0.96(±0.03)0.987(±0.002)
450.827(±0.039)0.971(±0.013)0.971(±0.013)0.978(±0.002)0.841(±0.1)0.849(±0.074)0.849(±0.074)0.973(±0.004)0.979(±0.006)
Avg.0.71(±0.063)0.772(±0.009)0.772(±0.009)0.778(±0.007)0.722(±0.075)0.739(±0.048)0.739(±0.066)0.812(±0.012)0.841(±0.014)
任务CoDATS
启发式理论误差保证
所以TMVTMRTCRSORIWVDEVIWA(我们的)TB
060.64(±0.096)0.497(±0.005)0.497(±0.005)0.499(±0.006)0.582(±0.122)0.689(±0.069)0.606(±0.106)0.718(±0.017)0.735(±0.01)
160.775(±0.091)0.946(±0.0)0.946(±0.0)0.946(±0.0)0.683(±0.076)0.896(±0.069)0.896(±0.069)0.939(±0.006)0.947(±0.009)
270.527(±0.076)0.473(±0.012)0.473(±0.012)0.475(±0.01)0.534(±0.052)0.472(±0.019)0.555(±0.069)0.472(±0.005)0.558(±0.041)
380.783(±0.015)0.971(±0.0180.971(±0.018)0.969(±0.018)0.97(±0.025)0.789(±0.01)0.789(±0.01)0.96(±0.03)0.987(±0.002)
450.827(±0.039)0.971(±0.013)0.971(±0.013)0.978(±0.002)0.841(±0.1)0.849(±0.074)0.849(±0.074)0.973(±0.004)0.979(±0.006)
平均值0.71(±0.063)0.772(±0.009)0.772(±0.009)0.778(±0.007)0.722(±0.075)0.739(±0.048)0.739(±0.066)0.812(±0.012)0.841(±0.014)
TaskDeep-Coral
HeuristicTheoretical error guarantees
SOTMVTMRTCRSORIWVDEVIWA (ours)TB
060.692(±0.026)0.731(±0.028)0.731(±0.028)0.736(±0.024)0.632(±0.024)0.703(±0.038)0.7(±0.038)0.728(±0.019)0.703(±0.024)
160.862(±0.018)0.893(±0.009)0.893(±0.009)0.896(±0.0040.822(±0.104)0.864(±0.017)0.864(±0.017)0.886(±0.009)0.911(±0.017)
270.509(±0.097)0.454(±0.02)0.454(±0.02)0.457(±0.025)0.51(±0.057)0.499(±0.079)0.539(±0.076)0.46(±0.012)0.565(±0.117)
380.798(±0.016)0.801(±0.004)0.801(±0.0040.803(±0.006)0.793(±0.024)0.799(±0.013)0.802(±0.016)0.812(±0.0)0.822(±0.01)
450.862(±0.036)0.932(±0.025)0.932(±0.025)0.938(±0.022)0.648(±0.108)0.906(±0.027)0.887(±0.035)0.936(±0.016)0.96(±0.01)
Avg.0.745(±0.039)0.762(±0.017)0.762(±0.017)0.766(±0.016)0.681(±0.063)0.754(±0.035)0.758(±0.036)0.764(±0.011)0.792(±0.036)
任务深度珊瑚
启发式理论误差保证
SOTMVTMRTCRSORIWVDEVIWA (我们的)TB
060.692(±0.026)0.731(±0.028)0.731(±0.028)0.736(±0.024)0.632(±0.024)0.703(±0.038)0.7(±0.038)0.728(±0.019)0.703(±0.024)
160.862(±0.018)0.893(±0.009)0.893(±0.009)0.896(±0.004)0.822(±0.104)0.864(±0.017)0.864(±0.017)0.886(±0.009)0.911(±0.017)
270.509(±0.097)0.454(±0.02)0.454(±0.02)0.457(±0.025)0.51(±0.057)0.499(±0.079)0.539(±0.076)0.46(±0.012)0.565(±0.117)
380.798(±0.016)0.801(±0.004)0.801(±0.004)0.803(±0.006)0.793(±0.024)0.799(±0.013)0.802(±0.016)0.812(±0.0)0.822(±0.01)
450.862(±0.036)0.932(±0.025)0.932(±0.025)0.938(±0.022)0.648(±0.108)0.906(±0.027)0.887(±0.035)0.936(±0.016)0.96(±0.01)
平均值0.745(±0.039)0.762(±0.017)0.762(±0.017)0.766(±0.016)0.681(±0.063)0.754(±0.035)0.758(±0.036)0.764(±0.011)0.792(±0.036)
TaskCDAN
HeuristicTheoretical error guarantees
SOTMVTMRTCRSORIWVDEVIWA (ours)TB
060.717(±0.004)0.483(±0.008)0.483(±0.008)0.481(±0.006)0.699(±0.021)0.692(±0.083)0.692(±0.083)0.76(±0.016)0.718(±0.016)
160.742(±0.09)0.932(±0.006)0.932(±0.006)0.932(±0.01)0.906(±0.021)0.942(±0.011)0.944(±0.013)0.933(±0.0)0.946(±0.0)
270.554(±0.031)0.522(±0.05)0.522(±0.05)0.56(±0.059)0.58(±0.077)0.561(±0.068)0.561(±0.068)0.531(±0.066)0.624(±0.014)
380.77(±0.069)0.872(±0.09)0.872(±0.09)0.871(±0.091)0.812(±0.007)0.801(±0.004)0.801(±0.004)0.874(±0.076)0.987(±0.006)
450.859(±0.014)0.978(±0.005)0.978(±0.005)0.979(±0.002)0.828(±0.106)0.875(±0.087)0.875(±0.087)0.98(±0.0)0.982(±0.002)
Avg.0.728(±0.042)0.758(±0.032)0.758(±0.032)0.764(±0.034)0.765(±0.046)0.774(±0.051)0.775(±0.051)0.816(±0.031)0.851(±0.008)
任务CDAN
启发式理论误差保证
SOTMVTMRTCRSORIWVDEVIWA(我们的)TB
060.717(±0.004)0.483(±0.008)0.483(±0.008)0.481(±0.006)0.699(±0.021)0.692(±0.083)0.692(±0.083)0.76(±0.016)0.718(±0.016)
160.742(±0.09)0.932(±0.006)0.932(±0.006)0.932(±0.01)0.906(±0.021)0.942(±0.011)0.944(±0.013)0.933(±0.0)0.946(±0.0)
270.554(±0.031)0.522(±0.05)0.522(±0.05)0.56(±0.059)0.58(±0.077)0.561(±0.068)0.561(±0.068)0.531(±0.066)0.624(±0.014)
380.77(±0.069)0.872(±0.09)0.872(±0.09)0.871(±0.091)0.812(±0.007)0.801(±0.004)0.801(±0.004)0.874(±0.076)0.987(±0.006)
450.859(±0.014)0.978(±0.005)0.978(±0.005)0.979(±0.002)0.828(±0.106)0.875(±0.087)0.875(±0.087)0.98(±0.0)0.982(±0.002)
平均值0.728(±0.042)0.758(±0.032)0.758(±0.032)0.764(±0.034)0.765(±0.046)0.774(±0.051)0.775(±0.051)0.816(±0.031)0.851(±0.008)
TaskDANN
HeuristicTheoretical error guarantees
SOTMVTMRTCRSORIWVDEVIWA (ours)TB
060.704(±0.037)0.489(±0.006)0.489(±0.006)0.49(±0.006)0.585(±0.092)0.724(±0.016)0.724(±0.016)0.722(±0.071)0.711(±0.016)
160.833(±0.036)0.94(±0.002)0.94(±0.002)0.938(±0.004)0.868(±0.142)0.929(±0.004)0.931(±0.006)0.936(±0.002)0.939(±0.002)
270.591(±0.023)0.496(±0.004)0.496(±0.004)0.493(±0.011)0.557(±0.0610.618(±0.029)0.592(±0.022)0.49(±0.003)0.635(±0.025)
380.809(±0.02)0.966(±0.0150.966(±0.015)0.966(±0.015)0.823(±0.069)0.796(±0.03)0.796(±0.03)0.964(±0.006)0.983(±0.002)
450.846(±0.022)0.98(±0.0)0.98(±0.0)0.98(±0.0)0.779(±0.138)0.922(±0.078)0.922(±0.078)0.98(±0.0)0.98(±0.0)
Avg.0.757(±0.027)0.774(±0.006)0.774(±0.006)0.773(±0.007)0.722(±0.1)0.798(±0.031)0.793(±0.031)0.818(±0.016)0.85(±0.009)
任务DANN
启发式理论误差保证
SOTMVTMRTCRSORIWVDEVIWA(我们的)TB
060.704(±0.037)0.489(±0.006)0.489(±0.006)0.49(±0.006)0.585(±0.092)0.724(±0.016)0.724(±0.016)0.722(±0.071)0.711(±0.016)
160.833(±0.036)0.94(±0.002)0.94(±0.002)0.938(±0.004)0.868(±0.142)0.929(±0.004)0.931(±0.006)0.936(±0.002)0.939(±0.002)
270.591(±0.023)0.496(±0.004)0.496(±0.004)0.493(±0.011)0.557(±0.0610.618(±0.029)0.592(±0.022)0.49(±0.003)0.635(±0.025)
380.809(±0.02)0.966(±0.0150.966(±0.015)0.966(±0.015)0.823(±0.069)0.796(±0.03)0.796(±0.03)0.964(±0.006)0.983(±0.002)
450.846(±0.022)0.98(±0.0)0.98(±0.0)0.98(±0.0)0.779(±0.138)0.922(±0.078)0.922(±0.078)0.98(±0.0)0.98(±0.0)
平均0.757(±0.027)0.774(±0.006)0.774(±0.006)0.773(±0.007)0.722(±0.1)0.798(±0.031)0.793(±0.031)0.818(±0.016)0.85(±0.009)
TaskDSAN
HeuristicTheoretical error guarantees
soTMVTMRTCRSORIWVDEVIWA (ours)TB
060.653(±0.033)0.607(±0.057)0.607(±0.057)0.635(±0.119)0.597(±0.075)0.572(±0.103)0.471(±0.259)0.747(±0.027)0.664(±0.027)
160.806(±0.083)0.922(±0.002)0.922(±0.002)0.925(±0.0040.821(±0.105)0.908(±0.065)0.908(±0.065)0.929(±0.0)0.94(±0.005)
270.49(±0.053)0.496(±0.0040.496(±0.004)0.494(±0.003)0.485(±0.005)0.482(±0.012)0.49(±0.003)0.496(±0.0)0.586(±0.083)
380.797(±0.016)0.979(±0.0020.979(±0.002)0.977(±0.004)0.855(±0.097)0.816(±0.01)0.639(±0.454)0.971(±0.005)0.982(±0.01)
450.861(±0.043)0.98(±0.008)0.98(±0.008)0.979/±0.0060.863(±0.085)0.928(±0.08)0.471(±0.438)0.98(±0.0)0.982(±0.005)
Avg.0.721(±0.046)0.797(±0.015)0.797(±0.015)0.802(±0.027)0.724(±0.073)0.741(±0.054)0.596(±0.244)0.825(±0.006)0.831(±0.026)
任务DSAN
启发式理论误差保证
所以TMVTMRTCRSORIWVDEVIWA(我们的)TB
060.653(±0.033)0.607(±0.057)0.607(±0.057)0.635(±0.119)0.597(±0.075)0.572(±0.103)0.471(±0.259)0.747(±0.027)0.664(±0.027)
160.806(±0.083)0.922(±0.002)0.922(±0.002)0.925(±0.0040.821(±0.105)0.908(±0.065)0.908(±0.065)0.929(±0.0)0.94(±0.005)
270.49(±0.053)0.496(±0.0040.496(±0.004)0.494(±0.003)0.485(±0.005)0.482(±0.012)0.49(±0.003)0.496(±0.0)0.586(±0.083)
380.797(±0.016)0.979(±0.0020.979(±0.002)0.977(±0.004)0.855(±0.097)0.816(±0.01)0.639(±0.454)0.971(±0.005)0.982(±0.01)
450.861(±0.043)0.98(±0.008)0.98(±0.008)0.979/±0.0060.863(±0.085)0.928(±0.08)0.471(±0.438)0.98(±0.0)0.982(±0.005)
平均值0.721(±0.046)0.797(±0.015)0.797(±0.015)0.802(±0.027)0.724(±0.073)0.741(±0.054)0.596(±0.244)0.825(±0.006)0.831(±0.026)
Table 27: Mean and standard deviation (after ± ) of target classification accuracy on WISDM (Part 1) over 3 repetitions with different random initialization of model weights.
表 27:在 WISDM(第 1 部分)上,目标分类准确率的均值和标准差(在 ± 之后)经过 3 次重复,使用不同的模型权重随机初始化。
TaskHoMM
HeuristicTheoretical error guarantees
SOTMVTMRTCRSORIWVDEVIWA (ours)TB
18230.733(±0.058)0.744(±0.038)0.744(±0.038)0.756(±0.051)0.733(±0.033)0.733(±0.058)0.733(±0.058)0.711(±0.019)0.767(±0.033)
20300.853(±0.022)0.814(±0.022)0.814(±0.022)0.801(±0.011)0.827(±0.0)0.853(±0.022)0.853(±0.022)0.814(±0.011)0.853(±0.022)
35310.579(±0.055)0.77(±0.0140.77(±0.014)0.77(±0.014)0.722(±0.05)0.579(±0.055)0.579(±0.055)0.746(±0.027)0.77(±0.014)
6190.889(±0.009)0.864(±0.052)0.864(±0.052)0.823(±0.035)0.884(±0.017)0.889(±0.009)0.889(±0.009)0.823(±0.035)0.894(±0.026)
7180.711(±0.066)0.497(±0.011)0.497(±0.011)0.547(±0.068)0.711(±0.029)0.711(±0.0660.648(±0.123)0.547(±0.0)0.711(±0.066)
Avg.0.753(±0.042)0.738(±0.028)0.738(±0.028)0.739(±0.036)0.775(±0.026)0.753(±0.042)0.74(±0.053)0.728(±0.019)0.799(±0.032)
任务HoMM
启发式理论误差保证
SOTMVTMRTCRSORIWVDEVIWA(我们的)TB
18230.733(±0.058)0.744(±0.038)0.744(±0.038)0.756(±0.051)0.733(±0.033)0.733(±0.058)0.733(±0.058)0.711(±0.019)0.767(±0.033)
20300.853(±0.022)0.814(±0.022)0.814(±0.022)0.801(±0.011)0.827(±0.0)0.853(±0.022)0.853(±0.022)0.814(±0.011)0.853(±0.022)
35310.579(±0.055)0.77(±0.0140.77(±0.014)0.77(±0.014)0.722(±0.05)0.579(±0.055)0.579(±0.055)0.746(±0.027)0.77(±0.014)
6190.889(±0.009)0.864(±0.052)0.864(±0.052)0.823(±0.035)0.884(±0.017)0.889(±0.009)0.889(±0.009)0.823(±0.035)0.894(±0.026)
7180.711(±0.066)0.497(±0.011)0.497(±0.011)0.547(±0.068)0.711(±0.029)0.711(±0.0660.648(±0.123)0.547(±0.0)0.711(±0.066)
平均值0.753(±0.042)0.738(±0.028)0.738(±0.028)0.739(±0.036)0.775(±0.026)0.753(±0.042)0.74(±0.053)0.728(±0.019)0.799(±0.032)
TaskAdvSKM
HeuristicTheoretical error guarantees
soTMVTMRTCRSORIWVDEVIWA (ours)TB
18230.711(±0.019)0.744(±0.038)0.744(±0.038)0.744(±0.038)0.767(±0.033)0.711(±0.019)0.711(±0.019)0.744(±0.019)0.778(±0.096)
20300.872(±0.029)0.885(±0.033)0.885(±0.033)0.885(±0.033)0.872(±0.059)0.872(±0.029)0.872(±0.0290.853(±0.029)0.885(±0.033)
35310.619(±0.041)0.698(±0.027)0.698(±0.027)0.69(±0.082)0.675(±0.096)0.619(±0.041)0.619(±0.041)0.698(±0.027)0.714(±0.024)
6190.818(±0.131)0.894(±0.0)0.894(±0.0)0.869(±0.044)0.833(±0.052)0.818(±0.131)0.818(±0.131)0.874(±0.009)0.894(±0.0)
7180.717(±0.05)0.686(±0.058)0.686(±0.058)0.704(±0.039)0.566(±0.068)0.717(±0.05)0.717(±0.05)0.717(±0.019)0.736(±0.086)
Avg.0.747(±0.054)0.781(±0.031)0.781(±0.031)0.779(±0.047)0.742(±0.062)0.747(±0.054)0.747(±0.054)0.777(±0.021)0.801(±0.048)
任务AdvSKM
启发式理论误差保证
所以TMVTMRTCRSORIWVDEVIWA(我们的)TB
18230.711(±0.019)0.744(±0.038)0.744(±0.038)0.744(±0.038)0.767(±0.033)0.711(±0.019)0.711(±0.019)0.744(±0.019)0.778(±0.096)
20300.872(±0.029)0.885(±0.033)0.885(±0.033)0.885(±0.033)0.872(±0.059)0.872(±0.029)0.872(±0.0290.853(±0.029)0.885(±0.033)
35310.619(±0.041)0.698(±0.027)0.698(±0.027)0.69(±0.082)0.675(±0.096)0.619(±0.041)0.619(±0.041)0.698(±0.027)0.714(±0.024)
6190.818(±0.131)0.894(±0.0)0.894(±0.0)0.869(±0.044)0.833(±0.052)0.818(±0.131)0.818(±0.131)0.874(±0.009)0.894(±0.0)
7180.717(±0.05)0.686(±0.058)0.686(±0.058)0.704(±0.039)0.566(±0.068)0.717(±0.05)0.717(±0.05)0.717(±0.019)0.736(±0.086)
平均值0.747(±0.054)0.781(±0.031)0.781(±0.031)0.779(±0.047)0.742(±0.062)0.747(±0.054)0.747(±0.054)0.777(±0.021)0.801(±0.048)
TaskDIRT
HeuristicTheoretical error guarantees
SOTMVTMRTCRSOR1WVDEVIWA (ours)TB
18230.711(±0.038)0.733(±0.033)0.733(±0.033)0.733(±0.0)0.711(±0.019)0.711(±0.038)0.711(±0.038)0.733(±0.0)0.744(±0.019)
20300.872(±0.011)0.846(±0.0)0.846(±0.0)0.846(±0.0)0.891(±0.040.872(±0.011)0.859(±0.029)0.84(±0.011)0.917(±0.011)
35310.492(±0.036)0.746(±0.036)0.746(±0.036)0.746(±0.036)0.603(±0.107)0.492(±0.036)0.77(±0.0690.746(±0.014)0.778(±0.027)
6190.879(±0.026)0.833(±0.052)0.833(±0.052)0.833/±0.0520.869(±0.0230.879(±0.026)0.879(±0.026)0.838(±0.049)0.889(±0.009)
7180.736(±0.082)0.824(±0.011)0.824(±0.011)0.824(±0.011)0.704(±0.093)0.736(±0.082)0.767(±0.109)0.83(±0.0)0.83(±0.0)
Avg.0.738(±0.039)0.797(±0.027)0.797(±0.027)0.797(±0.02)0.756(±0.057)0.738(±0.039)0.797(±0.054)0.798(±0.015)0.832(±0.013)
任务DIRT
启发式理论误差保证
SOTMVTMRTCRSOR1WVDEVIWA(我们的)TB
18230.711(±0.038)0.733(±0.033)0.733(±0.033)0.733(±0.0)0.711(±0.019)0.711(±0.038)0.711(±0.038)0.733(±0.0)0.744(±0.019)
20300.872(±0.011)0.846(±0.0)0.846(±0.0)0.846(±0.0)0.891(±0.040.872(±0.011)0.859(±0.029)0.84(±0.011)0.917(±0.011)
35310.492(±0.036)0.746(±0.036)0.746(±0.036)0.746(±0.036)0.603(±0.107)0.492(±0.036)0.77(±0.0690.746(±0.014)0.778(±0.027)
6190.879(±0.026)0.833(±0.052)0.833(±0.052)0.833/±0.0520.869(±0.0230.879(±0.026)0.879(±0.026)0.838(±0.049)0.889(±0.009)
7180.736(±0.082)0.824(±0.011)0.824(±0.011)0.824(±0.011)0.704(±0.093)0.736(±0.082)0.767(±0.109)0.83(±0.0)0.83(±0.0)
平均值0.738(±0.039)0.797(±0.027)0.797(±0.027)0.797(±0.02)0.756(±0.057)0.738(±0.039)0.797(±0.054)0.798(±0.015)0.832(±0.013)
TaskDDC
HeuristicTheoretical error guarantees
SOTMVTMRTCRSORIWVDEVIWA (ours)TB
18230.7(±0.067)0.733(±0.033)0.733(±0.033)0.767(±0.033)0.7(±0.0)0.7(±0.067)0.7(±0.067)0.744(±0.019)0.767(±0.0)
20300.833(±0.022)0.865(±0.0)0.865(±0.0)0.859(±0.011)0.833(±0.029)0.833(±0.022)0.833(±0.022)0.84(±0.011)0.885(±0.019)
35310.54(±0.036)0.722(±0.06)0.722(±0.06)0.746(±0.0690.635(±0.096)0.54(±0.036)0.54(±0.036)0.746(±0.0270.73(±0.027)
6190.884(±0.017)0.884(±0.017)0.884(±0.017)0.859(±0.032)0.869(±0.044)0.884(±0.017)0.884(±0.017)0.869(±0.023)0.884(±0.017)
7180.748(±0.093)0.692(±0.039)0.692(±0.0390.704(±0.029)0.648(±0.079)0.748(±0.093)0.748(±0.093)0.711(±0.011)0.748(±0.093)
Avg.0.741(±0.047)0.779(±0.03)0.779(±0.03)0.787(±0.035)0.737(±0.05)0.741(±0.047)0.741(±0.047)0.782(±0.018)0.803(±0.031)
任务DDC
启发式理论误差保证
SOTMVTMRTCRSORIWVDEVIWA(我们的)TB
18230.7(±0.067)0.733(±0.033)0.733(±0.033)0.767(±0.033)0.7(±0.0)0.7(±0.067)0.7(±0.067)0.744(±0.019)0.767(±0.0)
20300.833(±0.022)0.865(±0.0)0.865(±0.0)0.859(±0.011)0.833(±0.029)0.833(±0.022)0.833(±0.022)0.84(±0.011)0.885(±0.019)
35310.54(±0.036)0.722(±0.06)0.722(±0.06)0.746(±0.069)0.635(±0.096)0.54(±0.036)0.54(±0.036)0.746(±0.0270.73(±0.027)
6190.884(±0.017)0.884(±0.017)0.884(±0.017)0.859(±0.032)0.869(±0.044)0.884(±0.017)0.884(±0.017)0.869(±0.023)0.884(±0.017)
7180.748(±0.093)0.692(±0.039)0.692(±0.0390.704(±0.029)0.648(±0.079)0.748(±0.093)0.748(±0.093)0.711(±0.011)0.748(±0.093)
平均值0.741(±0.047)0.779(±0.03)0.779(±0.03)0.787(±0.035)0.737(±0.05)0.741(±0.047)0.741(±0.047)0.782(±0.018)0.803(±0.031)
TaskCMD
HeuristicTheoretical error guarantees
soTMVTMRTCRSORIWVDEVIWA (ours)TB
18230.711(±0.038)0.678(±0.019)0.678(±0.019)0.644(±0.069)0.689(±0.107)0.689(±0.107)0.622(±0.069)0.7(±0.0)0.756(±0.038)
20300.853(±0.011)0.859(±0.022)0.859(±0.022)0.897(±0.0290.885(±0.051)0.853(±0.011)0.859(±0.011)0.904(±0.0)0.91(±0.022)
35310.683(±0.014)0.722(±0.014)0.722(±0.014)0.722(±0.014)0.659(±0.014)0.683(±0.014)0.69(±0.024)0.722(±0.027)0.77(±0.014)
6190.742(±0.084)0.798(±0.009)0.798(±0.009)0.803(±0.015)0.833(±0.12)0.783(±0.118)0.682(±0.124)0.798(±0.035)0.813(±0.12)
7180.56(±0.044)0.767(±0.058)0.767(±0.058)0.767(±0.058)0.572(±0.061)0.56(±0.044)0.579(±0.066)0.742(±0.029)0.736(±0.058)
Avg.0.71(±0.038)0.765(±0.024)0.765(±0.024)0.767(±0.037)0.728(±0.071)0.713(±0.059)0.686(±0.059)0.773(±0.018)0.797(±0.05)
任务命令
启发式理论误差保证
所以TMVTMRTCRSORIWVDEVIWA(我们的)TB
18230.711(±0.038)0.678(±0.019)0.678(±0.019)0.644(±0.069)0.689(±0.107)0.689(±0.107)0.622(±0.069)0.7(±0.0)0.756(±0.038)
20300.853(±0.011)0.859(±0.022)0.859(±0.022)0.897(±0.029)0.885(±0.051)0.853(±0.011)0.859(±0.011)0.904(±0.0)0.91(±0.022)
35310.683(±0.014)0.722(±0.014)0.722(±0.014)0.722(±0.014)0.659(±0.014)0.683(±0.014)0.69(±0.024)0.722(±0.027)0.77(±0.014)
6190.742(±0.084)0.798(±0.009)0.798(±0.009)0.803(±0.015)0.833(±0.12)0.783(±0.118)0.682(±0.124)0.798(±0.035)0.813(±0.12)
7180.56(±0.044)0.767(±0.058)0.767(±0.058)0.767(±0.058)0.572(±0.061)0.56(±0.044)0.579(±0.066)0.742(±0.029)0.736(±0.058)
平均值0.71(±0.038)0.765(±0.024)0.765(±0.024)0.767(±0.037)0.728(±0.071)0.713(±0.059)0.686(±0.059)0.773(±0.018)0.797(±0.05)
TaskMMDA
HeuristicTheoretical error guarantees
SOTMVTMRTCRSORIWVDEVIWA (ours)TB
18230.756(±0.051)0.767(±0.033)0.767(±0.033)0.767(±0.033)0.756(±0.019)0.756(±0.051)0.756(±0.051)0.8(±0.0)0.889(±0.019)
20300.872(±0.011)0.878(±0.029)0.878(±0.029)0.872(±0.029)0.814(±0.029)0.872(±0.011)0.872(±0.011)0.878(±0.029)0.891(±0.04)
35310.571(±0.024)0.786(±0.0)0.786(±0.0)0.754(±0.036)0.683(±0.107)0.571(±0.024)0.571(±0.024)0.77(±0.014)0.746(±0.0)
6190.879(±0.015)0.843(±0.087)0.843(±0.087)0.747(±0.009)0.864(±0.04)0.879(±0.015)0.879(±0.015)0.854(±0.009)0.909(±0.015)
7180.717(±0.05)0.585(±0.019)0.585(±0.019)0.585(±0.019)0.654(±0.143)0.717(±0.05)0.673(±0.022)0.648(±0.011)0.723(±0.029)
Avg.0.759(±0.03)0.772(±0.034)0.772(±0.034)0.745(±0.025)0.754(±0.068)0.759(±0.03)0.75(±0.025)0.79(±0.013)0.832(±0.021)
任务MMDA
启发式理论误差保证
SOTMVTMRTCRSORIWVDEVIWA (我们的)TB
18230.756(±0.051)0.767(±0.033)0.767(±0.033)0.767(±0.033)0.756(±0.019)0.756(±0.051)0.756(±0.051)0.8(±0.0)0.889(±0.019)
20300.872(±0.011)0.878(±0.029)0.878(±0.029)0.872(±0.029)0.814(±0.029)0.872(±0.011)0.872(±0.011)0.878(±0.029)0.891(±0.04)
35310.571(±0.024)0.786(±0.0)0.786(±0.0)0.754(±0.036)0.683(±0.107)0.571(±0.024)0.571(±0.024)0.77(±0.014)0.746(±0.0)
6190.879(±0.015)0.843(±0.087)0.843(±0.087)0.747(±0.009)0.864(±0.04)0.879(±0.015)0.879(±0.015)0.854(±0.009)0.909(±0.015)
7180.717(±0.05)0.585(±0.019)0.585(±0.019)0.585(±0.019)0.654(±0.143)0.717(±0.05)0.673(±0.022)0.648(±0.011)0.723(±0.029)
平均值0.759(±0.03)0.772(±0.034)0.772(±0.034)0.745(±0.025)0.754(±0.068)0.759(±0.03)0.75(±0.025)0.79(±0.013)0.832(±0.021)
Table 28: Mean and standard deviation (after ± ) of target classification accuracy on WISDM (Part 2) over 3 repetitions with different random initialization of model weights.
表 28:在 WISDM(第 2 部分)上,目标分类准确率的均值和标准差(在 ± 之后)经过 3 次重复,使用不同的模型权重随机初始化。
TaskCoDATS
HeuristicTheoretical error guarantees
SOTMVTMRTCRSORIWVDEVIWA (ours)TB
18230.689(±0.051)0.722(±0.069)0.722(±0.069)0.738(±0.058)0.667(±0.033)0.678(±0.077)0.7(±0.067)0.756(±0.019)0.756(±0.019)
20300.846(±0.069)0.891(±0.029)0.891(±0.029)0.885(±0.038)0.776(±0.022)0.846(±0.069)0.846(±0.069)0.904(±0.019)0.904(±0.051)
35310.619(±0.095)0.738(±0.048)0.738(±0.048)0.722(±0.014)0.643(±0.124)0.619(±0.095)0.73(±0.036)0.714(±0.063)0.786(±0.024)
6190.727(±0.069)0.899(±0.038)0.899(±0.038)0.864(±0.069)0.742(±0.076)0.727(±0.069)0.727(±0.069)0.899(±0.032)0.924(±0.08)
7180.673(±0.071)0.535(±0.076)0.535(±0.076)0.553(±0.066)0.585(±0.0980.673(±0.071)0.673(±0.0710.547(±0.057)0.692(±0.142)
Avg.0.711(±0.071)0.757(±0.052)0.757(±0.052)0.751(±0.049)0.682(±0.071)0.709(±0.076)0.735(±0.063)0.764(±0.038)0.812(±0.063)
任务CoDATS
启发式理论误差保证
SOTMVTMRTCRSORIWVDEVIWA(我们的)TB
18230.689(±0.051)0.722(±0.069)0.722(±0.069)0.738(±0.058)0.667(±0.033)0.678(±0.077)0.7(±0.067)0.756(±0.019)0.756(±0.019)
20300.846(±0.069)0.891(±0.029)0.891(±0.029)0.885(±0.038)0.776(±0.022)0.846(±0.069)0.846(±0.069)0.904(±0.019)0.904(±0.051)
35310.619(±0.095)0.738(±0.048)0.738(±0.048)0.722(±0.014)0.643(±0.124)0.619(±0.095)0.73(±0.036)0.714(±0.063)0.786(±0.024)
6190.727(±0.069)0.899(±0.038)0.899(±0.038)0.864(±0.069)0.742(±0.076)0.727(±0.069)0.727(±0.069)0.899(±0.032)0.924(±0.08)
7180.673(±0.071)0.535(±0.076)0.535(±0.076)0.553(±0.066)0.585(±0.0980.673(±0.071)0.673(±0.0710.547(±0.057)0.692(±0.142)
平均值0.711(±0.071)0.757(±0.052)0.757(±0.052)0.751(±0.049)0.682(±0.071)0.709(±0.076)0.735(±0.063)0.764(±0.038)0.812(±0.063)
TaskDeep-Coral
HeuristicTheoretical error guarantees
SOTMVTMRTCRSORIWVDEVIWA (ours)TB
18230.656(±0.077)0.689(±0.019)0.689(±0.019)0.689(±0.019)0.667(±0.058)0.656(±0.077)0.656(±0.077)0.689(±0.069)0.711(±0.051)
20300.827(±0.033)0.891(±0.029)0.891(±0.029)0.872(±0.044)0.84(±0.04)0.827(±0.033)0.827(±0.033)0.885(±0.033)0.904(±0.038)
35310.619(±0.086)0.69(±0.071)0.69(±0.071)0.667(±0.082)0.532(±0.122)0.619(±0.086)0.619(±0.086)0.69(±0.048)0.706(±0.06)
6190.722(±0.044)0.732(±0.053)0.732(±0.053)0.707(±0.035)0.717(±0.068)0.722(±0.044)0.722(±0.044)0.722(±0.0090.788(±0.139)
7180.648(±0.047)0.61(±0.054)0.61(±0.054)0.629(±0.071)0.566(±0.082)0.648(±0.0470.528(±0.218)0.629(±0.039)0.654(±0.029)
Avg.0.694(±0.057)0.723(±0.046)0.723(±0.046)0.713(±0.051)0.664(±0.074)0.694(±0.057)0.67(±0.091)0.723(±0.04)0.753(±0.063)
任务深度珊瑚
启发式理论误差保证
SOTMVTMRTCRSORIWVDEVIWA (我们的)TB
18230.656(±0.077)0.689(±0.019)0.689(±0.019)0.689(±0.019)0.667(±0.058)0.656(±0.077)0.656(±0.077)0.689(±0.069)0.711(±0.051)
20300.827(±0.033)0.891(±0.029)0.891(±0.029)0.872(±0.044)0.84(±0.04)0.827(±0.033)0.827(±0.033)0.885(±0.033)0.904(±0.038)
35310.619(±0.086)0.69(±0.071)0.69(±0.071)0.667(±0.082)0.532(±0.122)0.619(±0.086)0.619(±0.086)0.69(±0.048)0.706(±0.06)
6190.722(±0.044)0.732(±0.053)0.732(±0.053)0.707(±0.035)0.717(±0.068)0.722(±0.044)0.722(±0.044)0.722(±0.0090.788(±0.139)
7180.648(±0.047)0.61(±0.054)0.61(±0.054)0.629(±0.071)0.566(±0.082)0.648(±0.0470.528(±0.218)0.629(±0.039)0.654(±0.029)
平均值0.694(±0.057)0.723(±0.046)0.723(±0.046)0.713(±0.051)0.664(±0.074)0.694(±0.057)0.67(±0.091)0.723(±0.04)0.753(±0.063)
TaskCDAN
HeuristicTheoretical error guarantees
soTMVTMRTCRSORIWVDEVIWA (ours)TB
18230.767(±0.033)0.678(±0.038)0.678(±0.038)0.722(±0.038)0.756(±0.019)0.767(±0.033)0.767(±0.033)0.7(±0.0)0.767(±0.033)
20300.872(±0.029)0.853(±0.029)0.853(±0.029)0.84(±0.011)0.872(±0.04)0.872(±0.0290.833(±0.04)0.84(±0.029)0.878(±0.044)
35310.563(±0.05)0.738(±0.041)0.738(±0.041)0.746(±0.036)0.651(±0.144)0.563(±0.05)0.722(±0.172)0.722(±0.014)0.77(±0.06)
6190.869(±0.044)0.833(±0.052)0.833(±0.052)0.838(±0.049)0.838(±0.049)0.869(±0.044)0.717(±0.306)0.848(±0.045)0.869(±0.044)
7180.73(±0.093)0.711(±0.115)0.711(±0.115)0.761(±0.039)0.723(±0.058)0.73(±0.093)0.711(±0.126)0.717(±0.068)0.767(±0.076)
Avg.0.76(±0.05)0.762(±0.055)0.762(±0.055)0.781(±0.035)0.768(±0.062)0.76(±0.05)0.75(±0.135)0.765(±0.031)0.81(±0.052)
任务CDAN
启发式理论误差保证
所以TMVTMRTCRSORIWVDEVIWA(我们的)TB
18230.767(±0.033)0.678(±0.038)0.678(±0.038)0.722(±0.038)0.756(±0.019)0.767(±0.033)0.767(±0.033)0.7(±0.0)0.767(±0.033)
20300.872(±0.029)0.853(±0.029)0.853(±0.029)0.84(±0.011)0.872(±0.04)0.872(±0.0290.833(±0.04)0.84(±0.029)0.878(±0.044)
35310.563(±0.05)0.738(±0.041)0.738(±0.041)0.746(±0.036)0.651(±0.144)0.563(±0.05)0.722(±0.172)0.722(±0.014)0.77(±0.06)
6190.869(±0.044)0.833(±0.052)0.833(±0.052)0.838(±0.049)0.838(±0.049)0.869(±0.044)0.717(±0.306)0.848(±0.045)0.869(±0.044)
7180.73(±0.093)0.711(±0.115)0.711(±0.115)0.761(±0.039)0.723(±0.058)0.73(±0.093)0.711(±0.126)0.717(±0.068)0.767(±0.076)
平均值0.76(±0.05)0.762(±0.055)0.762(±0.055)0.781(±0.035)0.768(±0.062)0.76(±0.05)0.75(±0.135)0.765(±0.031)0.81(±0.052)
TaskDANN
HeuristicTheoretical error guarantees
soTMVTMRTCRSORIWVDEVIWA (ours)TB
18230.689(±0.107)0.778(±0.069)0.778(±0.069)0.778(±0.069)0.756(±0.038)0.667(±0.088)0.656(±0.077)0.767(±0.0)0.778(±0.019)
20300.795(±0.068)0.846(±0.0)0.846(±0.0)0.859(±0.022)0.865(±0.0330.795(±0.068)0.795(±0.068)0.846(±0.019)0.885(±0.051)
35310.651(±0.107)0.77(±0.014)0.77(±0.014)0.77(±0.027)0.659(±0.159)0.651(±0.107)0.651(±0.107)0.77(±0.027)0.746(±0.027)
6190.788(±0.124)0.914(±0.046)0.914(±0.046)0.924(±0.052)0.854(±0.126)0.788(±0.124)0.788(±0.124)0.899(±0.0610.919(±0.052)
7180.698(±0.033)0.704(±0.029)0.704(±0.029)0.648(±0.029)0.591(±0.104)0.698i^±0.0330.623i^±0.1910.61(±0.054)0.698(±0.029)
Avg.0.724(±0.088)0.802(±0.032)0.802(±0.032)0.796(±0.04)0.745(±0.092)0.72(±0.084)0.702(±0.113)0.778(±0.032)0.805(±0.036)
任务DANN
启发式理论误差保证
所以TMVTMRTCRSORIWVDEVIWA(我们的)TB
18230.689(±0.107)0.778(±0.069)0.778(±0.069)0.778(±0.069)0.756(±0.038)0.667(±0.088)0.656(±0.077)0.767(±0.0)0.778(±0.019)
20300.795(±0.068)0.846(±0.0)0.846(±0.0)0.859(±0.022)0.865(±0.0330.795(±0.068)0.795(±0.068)0.846(±0.019)0.885(±0.051)
35310.651(±0.107)0.77(±0.014)0.77(±0.014)0.77(±0.027)0.659(±0.159)0.651(±0.107)0.651(±0.107)0.77(±0.027)0.746(±0.027)
6190.788(±0.124)0.914(±0.046)0.914(±0.046)0.924(±0.052)0.854(±0.126)0.788(±0.124)0.788(±0.124)0.899(±0.0610.919(±0.052)
7180.698(±0.033)0.704(±0.029)0.704(±0.029)0.648(±0.029)0.591(±0.104)0.698i^±0.0330.623i^±0.1910.61(±0.054)0.698(±0.029)
平均值0.724(±0.088)0.802(±0.032)0.802(±0.032)0.796(±0.04)0.745(±0.092)0.72(±0.084)0.702(±0.113)0.778(±0.032)0.805(±0.036)
TaskDSAN
HeuristicTheoretical error guarantees
soTMVTMRTCRSORIWVDEVIWA (ours)TB
18230.756(±0.051)0.767(±0.033)0.767(±0.033)0.778(±0.038)0.711(±0.051)0.756(±0.051)0.756(±0.051)0.744(±0.019)0.8(±0.0)
20300.872(±0.011)0.859(±0.022)0.859(±0.022)0.846(±0.0)0.84(±0.011)0.872(±0.011)0.731(±0.233)0.865(±0.0)0.872(±0.011)
35310.571(±0.024)0.786(±0.0)0.786(±0.0)0.778(±0.014)0.698(±0.151)0.571(±0.024)0.571(±0.024)0.778(±0.014)0.817(±0.027)
6190.879(±0.015)0.864(±0.052)0.864(±0.052)0.833(±0.052)0.889(±0.0090.879(±0.015)0.662(±0.363)0.838(±0.049)0.879(±0.009)
7180.717(±0.05)0.572(±0.076)0.572(±0.076)0.547(±0.068)0.648(±0.0540.717(±0.05)0.597(±0.076)0.384(±0.047)0.717(±0.05)
Avg.0.759(±0.03)0.769(±0.037)0.769(±0.037)0.756(±0.035)0.757(±0.055)0.759(±0.03)0.663(±0.149)0.722(±0.026)0.817(±0.019)
任务DSAN
启发式理论误差保证
所以TMVTMRTCRSORIWVDEVIWA(我们的)TB
18230.756(±0.051)0.767(±0.033)0.767(±0.033)0.778(±0.038)0.711(±0.051)0.756(±0.051)0.756(±0.051)0.744(±0.019)0.8(±0.0)
20300.872(±0.011)0.859(±0.022)0.859(±0.022)0.846(±0.0)0.84(±0.011)0.872(±0.011)0.731(±0.233)0.865(±0.0)0.872(±0.011)
35310.571(±0.024)0.786(±0.0)0.786(±0.0)0.778(±0.014)0.698(±0.151)0.571(±0.024)0.571(±0.024)0.778(±0.014)0.817(±0.027)
6190.879(±0.015)0.864(±0.052)0.864(±0.052)0.833(±0.052)0.889(±0.0090.879(±0.015)0.662(±0.363)0.838(±0.049)0.879(±0.009)
7180.717(±0.05)0.572(±0.076)0.572(±0.076)0.547(±0.068)0.648(±0.0540.717(±0.05)0.597(±0.076)0.384(±0.047)0.717(±0.05)
平均值0.759(±0.03)0.769(±0.037)0.769(±0.037)0.756(±0.035)0.757(±0.055)0.759(±0.03)0.663(±0.149)0.722(±0.026)0.817(±0.019)
2.3 SymbolicAI: A framework for logic-based approaches combining generative models and solvers
2.3 SymbolicAI:一个结合生成模型和求解器的基于逻辑的方法框架

SYMBOLICAI: A FRAMEWORK FOR LOGIC-BASED APPROACHES COMBINING GENERATIVE MODELS AND SOLVERS
SYMBOLICAI:一个结合生成模型和求解器的基于逻辑的方法框架

Marius-Constantin Dinu* Claudiu Leoveanu-Condrei Markus Holzleitner*
Werner Zellinger# § Sepp Hochreiter*
ExtensityAI Johannes Kepler University* RICAM(P) Amazon Devices
ExtensityAI 约翰·开普勒大学* RICAM(P) 亚马逊设备

ABSTRACT
摘要

We introduce SymbolicAI, a versatile and modular framework employing a logic-based approach to concept learning and flow management in generative processes. SymbolicAI enables the seamless integration of generative models with a diverse range of solvers by treating large language models (LLMs) as semantic parsers that execute tasks based on both natural and formal language instructions, thus bridging the gap between symbolic reasoning and generative AI. We leverage probabilistic programming principles to tackle complex tasks, and utilize differentiable and classical programming paradigms with their respective strengths. The framework introduces a set of polymorphic, compositional, and self-referential operations for multi-modal data that connects multi-step generative processes and aligns their outputs with user objectives in complex workflows. As a result, we can transition between the capabilities of various foundation models with in-context learning capabilities and specialized, fine-tuned models or solvers proficient in addressing specific problems. Through these operations based on in-context learning our framework enables the creation and evaluation of explainable computational graphs. Finally, we introduce a quality measure and its empirical score for evaluating these computational graphs, and propose a benchmark that compares various state-of-the-art LLMs across a set of complex workflows. We refer to the empirical score as the "Vector Embedding for Relational Trajectory Evaluation through Cross-similarity", or VERTEX score for short. The framework codebase 1 and benchmark 2 are linked below.
我们介绍了 SymbolicAI,这是一个多功能和模块化的框架,采用基于逻辑的方法进行概念学习和生成过程中的流程管理。SymbolicAI 通过将大型语言模型(LLMs)视为语义解析器,能够无缝集成生成模型与各种求解器,执行基于自然语言和形式语言指令的任务,从而弥合符号推理与生成 AI 之间的差距。我们利用概率编程原理来处理复杂任务,并结合可微编程和经典编程范式及其各自的优势。该框架引入了一组多态、组合和自指的操作,用于多模态数据,连接多步骤生成过程,并将其输出与复杂工作流中的用户目标对齐。因此,我们可以在具有上下文学习能力的各种基础模型与专门的、经过微调的模型或擅长解决特定问题的求解器之间进行转换。通过这些基于上下文学习的操作,我们的框架能够创建和评估可解释的计算图。最后,我们引入了一种质量度量及其经验分数,用于评估这些计算图,并提出一个基准,比较一组复杂工作流中的各种最先进的 LLM。我们将经验分数称为“通过交叉相似性进行关系轨迹评估的向量嵌入”,简称 VERTEX 分数。框架代码库 1 和基准 2 链接如下。
Figure 1: Our neuro-symbolic framework enables a seamless transition between symbolic and differentiable programming, each with distinct dynamics and strengths. Differentiable programming provides access to foundational and specialist models. Classical programming, on the other hand, shifts between abstraction and implementation, focusing on high-level concepts before delving into the details of implementation.
图 1:我们的神经符号框架实现了符号编程与可微编程之间的无缝过渡,每种编程方式具有不同的动态和优势。可微编程提供对基础和专业模型的访问。另一方面,经典编程在抽象与实现之间切换,关注高层概念,然后再深入实现细节。

1 SymbolicAI framework: https://github.com/ExtensityAI/symbolicai
1 SymbolicAI 框架:https://github.com/ExtensityAI/symbolicai
2 Evaluation benchmark: https://github.com/ExtensityAI/benchmark
2 评估基准:https://github.com/ExtensityAI/benchmark
  • Correspondence to: dinu@ml.jku.at, \{marius, leo\}@extensity.ai
  • 通信至: dinu@ml.jku.at, \{marius, leo\}@extensity.ai
II Work done outside of Amazon.
II 在亚马逊之外完成的工作。

1 INTRODUCTION
1 引言

The recent surge in generative AI, particularly involving large language models (LLMs), has demonstrated their wide-ranging applicability across various domains (Badita, 2022; Degrave, 2022). These models have enhanced the functionality of tools for search-based interactions (YouWrite, 2022; Writesonic, 2022; Microsoft, 2023), program synthesis (Jain et al., 2021; Romera-Paredes et al., 2023; Key et al., 2023), chat-based interactions (ReplikaAI, 2016; OpenAI, 2022; Google, 2023), and many more. Moreover, language-based approaches have facilitated connections between different modalities, enabling text-to-image (Ramesh et al., 2021; Saharia et al., 2022), text-to-video (Singer et al., 2022), text-to-3D (Poole et al., 2022), text-to-audio (Oord et al., 2016; Wang et al., 2017), and text-to-code (Wang et al., 2021b; Lu et al., 2021; Li et al., 2022b) transformations, to name a few. Consequently, by training on vast quantities of unlabelled textual data, LLMs have been shown to not only store factual knowledge (Petroni et al., 2019; Kassner et al., 2020) and approximate users' intentions to some extent (Andreas, 2022), but also to unlock deep specialist capabilities through innovative prompting techniques (Nori et al., 2023).
最近生成性人工智能的激增,特别是涉及大型语言模型(LLMs),展示了它们在各个领域的广泛适用性(Badita, 2022; Degrave, 2022)。这些模型增强了基于搜索的交互工具的功能(YouWrite, 2022; Writesonic, 2022; Microsoft, 2023)、程序合成(Jain et al., 2021; Romera-Paredes et al., 2023; Key et al., 2023)、基于聊天的交互(ReplikaAI, 2016; OpenAI, 2022; Google, 2023)等。此外,基于语言的方法促进了不同模态之间的连接,使得文本到图像(Ramesh et al., 2021; Saharia et al., 2022)、文本到视频(Singer et al., 2022)、文本到3D(Poole et al., 2022)、文本到音频(Oord et al., 2016; Wang et al., 2017)和文本到代码(Wang et al., 2021b; Lu et al., 2021; Li et al., 2022b)等转换成为可能。因此,通过在大量未标记的文本数据上进行训练,LLMs不仅被证明能够存储事实知识(Petroni et al., 2019; Kassner et al., 2020)并在一定程度上近似用户的意图(Andreas, 2022),还能够通过创新的提示技术解锁深层专业能力(Nori et al., 2023)。
Despite their versatility, current LLMs face challenges such as fallacious reasoning and the generation of erroneous content, commonly referred to as hallucinations (Jones & Steinhardt, 2022). These limitations highlight the importance of integrating complementary symbolic methods to validate and guide the generative processes of LLMs, ensuring more accurate and reliable outputs. In parallel, efforts have focused on developing tool-based approaches (Schick et al., 2023) or template frameworks (Chase, 2023) to extend LLMs' capabilities and enable a broader spectrum of applications. However, these efforts only partially capture the potential inherent in leveraging LLMs as semantic parsers. In contrast to parsers for structured languages a semantic parser is able to break down unstructured human language into semantically meaningful components and transform those into a structured form. While traditionally semantic parsing has been a role filled by specialized algorithms and models, we posit that LLMs, through their training language processing capabilities. In turn, we identify LLMs as a central component in creating sophisticated neuro-symbolic (NeSy) AI systems. These systems integrate symbolic and sub-symbolic concepts and utilize the capabilities of semantic parsing to develop symbolic expressions that enable new probabilistic programming paradigms.
尽管当前的大型语言模型(LLMs)具有多功能性,但仍面临诸如谬误推理和生成错误内容(通常称为幻觉)(Jones & Steinhardt, 2022)等挑战。这些局限性突显了整合互补符号方法的重要性,以验证和指导LLMs的生成过程,从而确保输出更准确和可靠。与此同时,研究者们也专注于开发基于工具的方法(Schick et al., 2023)或模板框架(Chase, 2023),以扩展LLMs的能力并使其能够应用于更广泛的领域。然而,这些努力仅部分捕捉了利用LLMs作为语义解析器所固有的潜力。与结构化语言的解析器不同,语义解析器能够将非结构化的人类语言分解为具有语义意义的组成部分,并将其转换为结构化形式。虽然传统上语义解析是由专门的算法和模型完成的,但我们认为LLMs通过其语言处理能力可以承担这一角色。因此,我们将LLMs视为创建复杂神经符号(NeSy)人工智能系统的核心组成部分。这些系统整合了符号和亚符号概念,并利用语义解析的能力来开发符号表达,从而启用新的概率编程范式。
We introduce SymbolicAI, a compositional NeSy framework able to represent and manipulate multi-modal and self-referential structures (Schmidhuber, 2007; Fernando et al., 2023). SymbolicAI augments the generative process of LLMs with in-context learning operations, realized through functional primitives, and enables the creation of versatile applications through in-context learning (Wei et al., 2022a). These operations enable logic-based components that guide the generative process and enable a modular NeSy system, including a wide range of existing solvers, formal language engines for mathematical expression evaluation, theorem provers, knowledge bases, and search engines for information retrieval. SymbolicAI exposes these solvers as building blocks for constructing compositional functions as computational graphs, making it possible to bridge classical and differentiable programming paradigms with the aim to create domain-invariant problem solvers. In designing the architecture of SymbolicAI, we drew inspiration from a body of evidence that suggests the human brain possesses a selective language processing module (Macsweeney, 2002; Fedorenko et al., 2010; Menenti et al., 2011; Regev et al., 2013; Scott et al., 2016; Deniz et al., 2019; Hu et al., 2022), prior research on cognitive architectures (Newell & Simon, 1956; Newell et al., 1957; Newell & Simon, 1972; Newell, 1990; Laird, 2022), and the significance of language on the structure of semantic maps in the human brain (Huth et al., 2016). We consider language as a central processing module, distinct from other cognitive processes such as reasoning or memory (Paischer et al., 2022; 2023). We hypothesize that such a central processing module based in language is a core component of broad AI systems (see Appendix Section A) and enables the development of fully autonomous AI systems for decision-making.
我们介绍了 SymbolicAI,这是一个组合式的 NeSy 框架,能够表示和操作多模态和自指结构(Schmidhuber, 2007; Fernando et al., 2023)。SymbolicAI 通过功能原语增强了 LLM 的生成过程,结合了上下文学习操作,并通过上下文学习实现了多功能应用的创建(Wei et al., 2022a)。这些操作使得基于逻辑的组件能够引导生成过程,并实现一个模块化的 NeSy 系统,包括广泛的现有求解器、用于数学表达评估的形式语言引擎、定理证明器、知识库和信息检索的搜索引擎。SymbolicAI 将这些求解器作为构建组合函数的构件暴露为计算图,使得能够桥接经典和可微编程范式,旨在创建领域不变的问题求解器。在设计 SymbolicAI 的架构时,我们受到了一系列证据的启发,这些证据表明人类大脑具有选择性语言处理模块(Macsweeney, 2002; Fedorenko et al., 2010; Menenti et al., 2011; Regev et al., 2013; Scott et al., 2016; Deniz et al., 2019; Hu et al., 2022),以及对认知架构的先前研究(Newell & Simon, 1956; Newell et al., 1957; Newell & Simon, 1972; Newell, 1990; Laird, 2022),以及语言在大脑语义图结构中的重要性(Huth et al., 2016)。我们将语言视为一个核心处理模块,与推理或记忆等其他认知过程有所区别(Paischer et al., 2022; 2023)。我们假设这样一个基于语言的核心处理模块是广泛 AI 系统的核心组成部分(见附录 A 节),并使得完全自主的 AI 系统在决策制定中得以发展。
A significant challenge encountered in the development of our framework pertained to the evaluation of LLMs when used as semantic parsers in a NeSy workflow. Current evaluation of generated content relies on metrics for single-step generative processes, such as the BLEU score (Papineni et al., 2002). These metrics are not suitable for evaluating multi-step generative processes. BLEU has limitations,as it measures n -gram-based overlap of generated output with a reference that does not consider the semantic meaning. As a result, BLEU fails to capture semantic equivalence, especially in more complex tasks. More recent metrics such as CIDEr (Vedantam et al., 2014) or SPICE (Anderson et al., 2016) are also not suitable for our problem, either because they are built upon BLEU (in case of CIDEr) or designed with inductive biases specifically for image captioning.
在我们框架的开发中遇到的一个重大挑战与在 NeSy 工作流程中将 LLM 作为语义解析器时的评估有关。目前对生成内容的评估依赖于单步生成过程的度量标准,例如 BLEU 分数(Papineni 等,2002)。这些度量标准不适合评估多步生成过程。BLEU 有其局限性,因为它测量生成输出与参考之间的 n -gram 基重叠,而不考虑语义意义。因此,BLEU 无法捕捉语义等价性,尤其是在更复杂的任务中。更近期的度量标准,如 CIDEr(Vedantam 等,2014)或 SPICE(Anderson 等,2016),也不适合我们的问题,原因要么是它们基于 BLEU(在 CIDEr 的情况下),要么是专门为图像描述设计的归纳偏见。
Therefore, alongside our framework we introduce a quality measure (and its empirical score) for multi-step generative processes based on semantic meaning. We refer to our score as "Vector Embedding for Relational Trajectory Evaluation through Cross-similarity", or VERTEX score for short. Our VERTEX score uses embeddings to compare node distributions within a computational graph. It measures the semantic meaning across the distributional path by computing at each node the cross-similarity between the generated embeddings and embeddings sampled from a reference
因此,除了我们的框架外,我们还引入了一种基于语义意义的多步生成过程的质量度量(及其经验分数)。我们将我们的分数称为“通过交叉相似性进行关系轨迹评估的向量嵌入”,简称 VERTEX 分数。我们的 VERTEX 分数使用嵌入来比较计算图中的节点分布。它通过计算每个节点生成的嵌入与从参考分布中采样的嵌入之间的交叉相似性,来测量分布路径上的语义意义。
distribution. Furthermore, the VERTEX score is designed such that it can be used as a reward signal in a reinforcement learning setting (Sutton, 1984). Finally, we propose a benchmark for evaluating complex workflows. We define a set of basic evaluations, particularly associative predictions based on in-context learning, multi-modal bindings for tool utilization, and program synthesis for subroutine execution. Furthermore, we introduce complex evaluations for logic-based components and hierarchical computational graphs.
此外,VERTEX 分数的设计使其可以作为强化学习环境中的奖励信号(Sutton,1984)。最后,我们提出了一个用于评估复杂工作流程的基准。我们定义了一组基本评估,特别是基于上下文学习的联想预测、工具利用的多模态绑定以及子例程执行的程序合成。此外,我们还引入了针对基于逻辑的组件和分层计算图的复杂评估。
In summary, the key contributions presented in this work are as follows:
总之,本研究提出的关键贡献如下:
  • We introduce SymbolicAI, a logic-based framework for concept learning and flow management in generative processes, enabling seamless integration with a wide range of foundation models and solvers.
  • 我们介绍了 SymbolicAI,这是一个基于逻辑的框架,用于概念学习和生成过程中的流管理,使其能够与各种基础模型和求解器无缝集成。
  • We leverage LLMs as semantic parsers to enable the creation of complex computational graphs by combining symbolic expressions with probabilistic programming paradigms.
  • 我们利用大型语言模型(LLMs)作为语义解析器,通过将符号表达与概率编程范式相结合,来创建复杂的计算图。
  • We introduce a quality measure and its empirical score alongside a benchmark designed for multi-step generative processes for comparing LLMs across a wide range of complex tasks.
  • 我们引入了一种质量度量及其经验评分,并设计了一个基准,用于多步骤生成过程,以便在广泛的复杂任务中比较大型语言模型(LLMs)。

2 RELATED WORK
2 相关工作

Symbolic Methods The field of symbolic AI has its foundations in the works of the Logic Theorist (LT) (Newell & Simon, 1956) and the General Problem Solver (GPS) (Newell et al., 1957). These programs represented the first steps towards automated reasoning and problem-solving utilizing symbolic representations. Despite their advancements, both faced challenges in dealing with the complexity of real-world problems, particularly due to the combinatorial nature of the solution space. To address these limitations, the Soar (Laird et al., 1987) cognitive architecture was developed, advancing the notion that intelligent behavior results from goal-oriented search through a problem space (Newell & Simon, 1972; McCarthy et al., 2006), with each step consisting of selecting and applying operators. Soar introduced components like reinforcement learning, impasses, sub-states, and chunking to enhance its problem-solving capabilities. It also demonstrated the importance of learning from experiences to adapt and improve performance over time. However, Santoro et al. (2022) emphasizes the subjectivity of symbols and suggests that human-like symbolic fluency could develop in machines through learning algorithms immersed in socio-cultural contexts. This perspective, anchored in the notion that symbols are triadic and their meaning emerges from consensus, seeks to move away from traditional symbolic AI methodologies towards AI that adaptively learns meaning and behaviors from human-like experiences. The goal is to cultivate machines that demonstrate symbolic behaviors across a spectrum of competencies, potentially mirroring the evolutionary and social learning processes observed in humans. Lastly, symbolic AI struggles with real-world data's unpredictability and variability. These challenges have led to the employment of statistical learning methodologies, like deep learning (Alom et al., 2018), which are more adept at managing noise and uncertain information through vector-valued representations.
符号方法 符号人工智能领域的基础源于逻辑理论家(LT)(Newell & Simon, 1956)和通用问题求解器(GPS)(Newell et al., 1957)的研究。这些程序代表了利用符号表示进行自动推理和问题解决的第一步。尽管取得了一定进展,但两者在处理现实世界问题的复杂性方面面临挑战,特别是由于解决方案空间的组合性质。为了解决这些局限性,Soar(Laird et al., 1987)认知架构的开发推进了智能行为源于通过问题空间的目标导向搜索的概念(Newell & Simon, 1972; McCarthy et al., 2006),每一步都包括选择和应用操作符。Soar引入了强化学习、僵局、子状态和块化等组件,以增强其问题解决能力。它还展示了从经验中学习的重要性,以便随着时间的推移适应和提高性能。然而,Santoro等(2022)强调了符号的主观性,并建议通过沉浸于社会文化背景中的学习算法,机器可能会发展出类人符号流畅性。这种观点基于符号是三元的,其意义源于共识,旨在摆脱传统符号人工智能方法,朝着能够适应性地从类人经验中学习意义和行为的人工智能发展。目标是培养能够在多种能力范围内展示符号行为的机器,可能反映出人类观察到的进化和社会学习过程。最后,符号人工智能在应对现实世界数据的不可预测性和变异性方面存在困难。这些挑战促使采用统计学习方法,如深度学习(Alom et al., 2018),这些方法更擅长通过向量值表示来处理噪声和不确定信息。
Sub-Symbolic Methods The sub-symbolic framework, rooted in neural network paradigms, began with pioneering works such as the perceptron (McCulloch & Pitts, 1943), with the first hardware implementation quickly following (Rosenblatt, 1958). The foundational notion of distributed processing (Rumelhart et al., 1986) was later bolstered and further expanded by demonstrating that multilayer feedforward networks with a single hidden layer can serve as universal approximators for any Borel measurable function, given sufficient hidden units (Hornik et al., 1989). Fast-forward, contemporary frameworks achieve a significant leap with the introduction of the Transformer architecture (Vaswani et al., 2017), which underpins most of today's LLMs. These LLMs demonstrate exceptional capabilities in in-context learning, a method popularized by the likes of GPT-3 (Brown et al., 2020), where models improve task performance through natural language instruction and examples provided directly in the input prompt. While in-context learning bypasses the need for explicit retraining, it demands meticulous prompt design to steer models towards desired behaviors.
子符号方法 子符号框架根植于神经网络范式,始于感知器等开创性工作(McCulloch & Pitts, 1943),随后迅速出现了首个硬件实现(Rosenblatt, 1958)。分布式处理的基础概念(Rumelhart et al., 1986)后来得到了加强,并进一步扩展,证明了具有单个隐藏层的多层前馈网络可以作为任何博雷尔可测函数的通用逼近器,只要隐藏单元足够(Hornik et al., 1989)。快进到现代框架,引入了Transformer架构(Vaswani et al., 2017),这为当今大多数LLM提供了基础。这些LLM在上下文学习方面表现出色,这一方法由GPT-3等模型推广(Brown et al., 2020),模型通过自然语言指令和直接在输入提示中提供的示例来提高任务性能。虽然上下文学习绕过了显式再训练的需要,但它要求精心设计提示,以引导模型朝向期望的行为。
Neuro-Symbolic Methods To overcome the limitations of each individual method, NeSy approaches meld the statistical inference strengths of deep neural architectures with the generalization and explainability of symbolic systems (Garcez et al., 2015; Besold et al., 2017; d'Avila Garcez et al., 2019; d'Avila Garcez & Lamb, 2020; Lamb et al., 2020; Hamilton et al., 2022; Yu et al., 2023). Some approaches focus on different strategies for integrating learning and reasoning processes (Yu et al., 2023; Fang et al., 2024). Firstly, learning for reasoning methods treat the learning aspect as an accelerator for reasoning, in which deep neural networks are employed to reduce the search space for symbolic systems (Silver et al., 2016; 2017b;a; Qu & Tang, 2019; Schrittwieser et al., 2020). Secondly, reasoning for learning views reasoning as a way to regularize learning, in which symbolic knowledge acts as a guiding constraint that a symbiotic relationship between learning and reasoning. Here, both elements interact and share information to boost problem-solving capabilities (Donadello et al., 2017; Manhaeve et al., 2018; Mao et al., 2019; Ellis, 2023). This synergy further extends when considering graph-based methods, which closely align with the objectives of our proposed framework. Research in this area, such as CycleGT (Guo et al., 2020) and Paper2vec (Ganguly & Pudi, 2017) explores unsupervised techniques for bridging graph and text representations, GPTSwarm (Zhuge et al., 2024) explores graph optimizers to refine node-level prompts and edge optimization. Subsequently, graph embeddings, when utilized within symbolic frameworks, can enhance knowledge graph reasoning tasks (Zhang et al., 2021), or more generally, provide the bedrock for learning domain-invariant representations (Park et al., 2023).
神经符号方法 为克服各个单一方法的局限性,NeSy 方法将深度神经架构的统计推理优势与符号系统的泛化能力和可解释性结合在一起(Garcez et al., 2015; Besold et al., 2017; d'Avila Garcez et al., 2019; d'Avila Garcez & Lamb, 2020; Lamb et al., 2020; Hamilton et al., 2022; Yu et al., 2023)。一些方法专注于整合学习和推理过程的不同策略(Yu et al., 2023; Fang et al., 2024)。首先,推理学习方法将学习视为推理的加速器,其中使用深度神经网络来减少符号系统的搜索空间(Silver et al., 2016; 2017b;a; Qu & Tang, 2019; Schrittwieser et al., 2020)。其次,学习推理方法将推理视为一种规范学习的方式,其中符号知识作为指导约束,形成学习与推理之间的共生关系。在这里,这两个元素相互作用并共享信息,以增强问题解决能力(Donadello et al., 2017; Manhaeve et al., 2018; Mao et al., 2019; Ellis, 2023)。当考虑基于图的方法时,这种协同作用进一步扩展,与我们提出的框架的目标紧密对齐。该领域的研究,如 CycleGT(Guo et al., 2020)和 Paper2vec(Ganguly & Pudi, 2017),探索了弥合图和文本表示的无监督技术,GPTSwarm(Zhuge et al., 2024)探索了图优化器以优化节点级提示和边优化。随后,当图嵌入在符号框架中使用时,可以增强知识图推理任务(Zhang et al., 2021),或更一般地,为学习领域不变表示提供基础(Park et al., 2023)。
Lastly, building upon the insights from Sun et al. (2022), the integration of NeSy techniques in scientific workflows promises significant acceleration in scientific discovery. While previous work has effectively identified opportunities and challenges, we have taken a more ambitious approach by developing a comprehensive framework from the ground up to facilitate a wide range of NeSy integrations.
最后,基于 Sun 等人 (2022) 的见解,将 NeSy 技术整合到科学工作流程中有望显著加速科学发现。尽管之前的工作有效地识别了机会和挑战,我们采取了更为雄心勃勃的方法,从零开始开发一个综合框架,以促进广泛的 NeSy 整合。
Large Language Models In part, instruction-based fine-tuning of LLMs through reinforcement learning from human feedback (Ouyang et al., 2022; Li et al., 2023) or direct preference optimization (Rafailov et al., 2023) has shown promising results dealing with value misalignment issues (Bradley Knox & Stone, 2008; MacGlashan et al., 2017; Christiano et al., 2017; Ibarz et al., 2018; Goyal et al., 2022), unlocking new possibilities for chain of thoughts (Wei recent research also highlights the limitations of LLMs in functional linguistic competence despite their proficiency in formal linguistic competence (Mahowald et al., 2023). Whereas formal linguistic competence encompasses the ability to understand and generate language, functional linguistic competence pertains to the application of language in real-world contexts, such as conveying sensory input or recalling information from memory. Examples of functional linguistic competence include implicatures (Ruis et al., 2022) and contextual language comprehension beyond the statistical manifestation of data distributions (Bransford & Johnson, 1972; Mikolov et al., 2013b). Consequently, operating LLMs through a purely inference-based approach confines their capabilities within their provided context window, severely limiting their horizon. This results in deficiencies for situational modeling, non-adaptability through text length may not yield greater capabilities, as demonstrated by the observed U-shaped performance curve (Liu et al., 2023) where LLMs excel when utilizing information at the beginning or end of the input context, but struggle with information located in the middle, especially as context increases. These challenges are actively being researched, with novel approaches such as Hyena (Poli et al., 2023), RWKV (Bo, 2021), GateLoop (Katsch, 2023), Mamba (Gu & Dao, 2023) and xLSTM (Beck et al., 2024) surfacing. Meanwhile, the re-emergence of interest in retrieval-augmented generative approaches (Li et al., 2022a) offers an alternative by circumventing the autoregressive nature of the widely-utilized Transformer architecture (Vaswani et al., 2017), enabling context enrichment with lateral information.
大型语言模型在一定程度上,通过人类反馈的强化学习(Ouyang et al., 2022; Li et al., 2023)或直接偏好优化(Rafailov et al., 2023)对LLM进行基于指令的微调,已在解决价值不一致问题上显示出良好的效果(Bradley Knox & Stone, 2008; MacGlashan et al., 2017; Christiano et al., 2017; Ibarz et al., 2018; Goyal et al., 2022),为思维链的探索开辟了新的可能性(Wei)。最近的研究还强调了LLM在功能语言能力方面的局限性,尽管它们在形式语言能力方面表现出色(Mahowald et al., 2023)。形式语言能力包括理解和生成语言的能力,而功能语言能力则涉及在现实世界情境中应用语言的能力,例如传达感官输入或从记忆中回忆信息。功能语言能力的例子包括隐含意义(Ruis et al., 2022)和超越数据分布统计表现的上下文语言理解(Bransford & Johnson, 1972; Mikolov et al., 2013b)。因此,通过纯粹的推理基础方法操作LLM会限制其能力在提供的上下文窗口内,严重限制了其视野。这导致了情境建模的不足,文本长度的非适应性可能不会带来更大的能力,正如观察到的U型性能曲线所示(Liu et al., 2023),LLM在利用输入上下文开头或结尾的信息时表现出色,但在处理位于中间的信息时却表现不佳,尤其是在上下文增加时。这些挑战正在积极研究中,出现了新的方法,如Hyena(Poli et al., 2023)、RWKV(Bo, 2021)、GateLoop(Katsch, 2023)、Mamba(Gu & Dao, 2023)和xLSTM(Beck et al., 2024)。与此同时,检索增强生成方法的重新兴起(Li et al., 2022a)提供了一种替代方案,通过规避广泛使用的Transformer架构的自回归特性(Vaswani et al., 2017),使上下文能够与横向信息相结合。
In-Context Learning Recently, several in-context learning methodologies evolved to enable tool usage through LLMs (Schick et al., 2023), or refine the generative outcome of LLMs (Yang et al., 2023). This includes chain-of-thought (CoT) prompting, a method that conditions the model to reveal its step-by-step reasoning process (Wei et al., 2022b; Singhal et al., 2023). CoT prompting breaks down complex tasks into simpler, sequential steps, and helps with interpreting LLM's output. Self-generated CoT, where models are encouraged to generate their own reasoning chains based on training examples, surpasses even expertly crafted CoT (Fernando et al., 2023). This observation echoes other reports that GPT-4 has an emergent self-improving capability through introspection, such as self-verification (Weng et al., 2023) or self-consistency (Wang et al., 2023b). Tree of Thoughts (ToT) enables LLMs to solve complex problems by exploring multiple reasoning paths through a search tree of coherent text units, demonstrating significant problem-solving enhancements in tasks requiring strategic planning and search (Yao et al., 2023a). Ensemble techniques further enhance the robustness and accuracy of model predictions by combining several strategies to establish a consensus (Nori et al., 2023).
在上下文学习中,最近出现了几种上下文学习方法,以通过大型语言模型(LLMs)实现工具使用(Schick et al., 2023),或优化LLMs的生成结果(Yang et al., 2023)。这包括链式思维(CoT)提示,这是一种使模型展示其逐步推理过程的方法(Wei et al., 2022b;Singhal et al., 2023)。CoT提示将复杂任务分解为更简单的顺序步骤,并有助于解释LLM的输出。自生成的CoT,鼓励模型基于训练示例生成自己的推理链,甚至超过了专家精心设计的CoT(Fernando et al., 2023)。这一观察与其他报告相呼应,表明GPT-4具有通过内省自我改善的能力,例如自我验证(Weng et al., 2023)或自我一致性(Wang et al., 2023b)。思维树(ToT)使LLMs能够通过探索多个推理路径来解决复杂问题,利用一致文本单元的搜索树,在需要战略规划和搜索的任务中显示出显著的解决问题能力提升(Yao et al., 2023a)。集成技术通过结合多种策略建立共识,进一步增强了模型预测的稳健性和准确性(Nori et al., 2023)。

3 PROBLEM DEFINITION
3 问题定义

Conventional approaches employing foundation models, such as LLMs, are predominantly confined to single-step or few-step executions and primarily reliant on hand-crafted prompt instructions, often referred to as in-context learning. This restricted scope limits the utilization of different modalities, lacks verification, and exhibits limited tool proficiency. We posit that the use of NeSy engines as core computation units, realized through logic-based methodologies coupled with sub-symbolic foundation models, offers a more general, robust, and verifiable perspective. This approach has several advantages. Firstly, it enables the integration of pre-existing solutions (e.g. various classical algorithms), to focus on decision-making (e.g. selecting the respective tool based on in-context classification). Thirdly, it provides an interpretable language-based control layer for explainable, autonomous systems. In the following section, we elaborate on the key design principles underlying SymbolicAI and how we guide the generative processes of NeSy engines. For further technical details, see Appendix Section 5.
传统方法采用基础模型,如大型语言模型(LLMs),主要局限于单步或少步执行,并主要依赖于手工制作的提示指令,通常称为上下文学习。这种受限的范围限制了不同模态的利用,缺乏验证,并表现出有限的工具熟练度。我们认为,使用NeSy引擎作为核心计算单元,通过基于逻辑的方法与亚符号基础模型相结合,提供了一种更通用、强大且可验证的视角。这种方法有几个优点。首先,它能够整合现有的解决方案(例如各种经典算法),以专注于决策制定(例如,根据上下文分类选择相应的工具)。第三,它为可解释的自主系统提供了一个可解释的基于语言的控制层。在接下来的部分中,我们详细阐述了SymbolicAI的关键设计原则,以及我们如何指导NeSy引擎的生成过程。有关更多技术细节,请参见附录第5节。

4 DESIGN PRINCIPLES
4 设计原则

Symbols and Expressions As posited by Newell & Simon (1976), symbols are elemental carriers of meaning within a computational context 3 . These symbols define physical patterns capable of composing complex structures,and are central to the design and interpretation of logic and knowledge representations (Augusto, 2022). We define a symbol as the set S=n0Ln formed by concatenating characters from a finite character set L ,i.e. the vocabulary in an LLM setting,and with n representing the sequence length of the string. Thus,let the set of all possible symbols be number of symbols from ,and when evaluated returns a new symbol in . For any subset {S1,S2,,Sm} , an expression is defined as ω:i=1mSiS from the set of all possible expressions ωΩ ,where S ,and represents the placeholder operation of combining and transforming the symbols according to specific rules for m number of symbols. Such a specific rule for can define an arithmetic expression :=+ where two symbols are added,i.e. ω:= " 1 " + "two" which results in a new symbol " 3 " or "three". Thus,SymbolicAI is based on the concept that symbols, and the expressions they form, are reflections of the information inherent in a NeSy system, and serve as surrogate for the interaction between the NeSy system and the problem space. Moreover, we argue that real patterns (Dennett, 1991), recurring and identifiable structures that coherently and reliably emerge in the data beyond mere randomness or noise, can be effectively realized through symbols.
符号与表达式 正如 Newell 和 Simon (1976) 所提出的,符号是在计算环境中承载意义的基本载体 3。这些符号定义了能够组成复杂结构的物理模式,并且在逻辑和知识表示的设计与解释中占据核心地位 (Augusto, 2022)。我们将符号定义为由有限字符集 L 中的字符连接而成的集合 S=n0Ln,即 LLM 环境中的词汇,并且 n 表示字符串的序列长度。因此,所有可能符号的集合可以表示为来自 的符号数量,当被评估时返回 中的新符号。对于任何子集 {S1,S2,,Sm},表达式被定义为来自所有可能表达式集合 ωΩω:i=1mSiS,其中 S,而 表示根据特定规则组合和转换符号的占位符操作,适用于 m 数量的符号。对于 的这种特定规则可以定义一个算术表达式 :=+,其中两个符号相加,即 ω:= " 1 " + "two",结果是一个新符号 " 3 " 或 "three"。因此,SymbolicAI 基于符号及其形成的表达式反映了 NeSy 系统中固有信息的概念,并作为 NeSy 系统与问题空间之间交互的替代物。此外,我们认为真实模式 (Dennett, 1991),即在数据中超越单纯随机性或噪声而一致可靠地出现的可识别结构,可以通过符号有效实现。
Furthermore, we utilize language as a tool for mapping complex concepts, leveraging its inherent semantics and abstractions to describe states and properties of a problem at hand. These mappings are universal, e.g. they may be utilized to define scene descriptions, long-horizon planning, acoustic properties, emotional states, physical conditions, etc. Therefore, language serves as a comprehensive, yet abstract framework to encapsulate meanings, and refer to it as the convex hull of the knowledge of our society. Subsequently, it is common to attribute existing physical objects with abstract concepts, as exemplified by our natural tendency to link tangible objects to colors and emotions, such as blending the color "red" with "heart", "warm", and "passion". This approach also anchors our work in the field of formal language theory, as we require a structured method to construct mappings from the world to language. Consequently, we use formal language structures, such as grammars, to systematically define our language-centric approach to problem-solving and the associated translation of real-world complexities into linguistic terms.
此外,我们利用语言作为映射复杂概念的工具,利用其固有的语义和抽象来描述当前问题的状态和属性。这些映射是普遍的,例如,它们可以用于定义场景描述、长远规划、声学特性、情感状态、物理条件等。因此,语言作为一个全面而抽象的框架来封装意义,我们将其称为我们社会知识的凸包。随后,常常将现有的物理对象与抽象概念联系起来,例如我们自然倾向于将有形物体与颜色和情感联系在一起,比如将“红色”与“心”、“温暖”和“激情”混合。这种方法也将我们的工作锚定在形式语言理论领域,因为我们需要一种结构化的方法将世界映射到语言。因此,我们使用形式语言结构,如语法,系统地定义我们以语言为中心的问题解决方法,以及将现实世界的复杂性转化为语言术语的相关翻译。
Formal Languages In formal language theory and linguistics, languages are structured following the Chomsky hierarchy, which classifies languages by the complexity of their grammatical structure (Chomsky, 1956). This hierarchy defines four types of grammars (Type-3 to Type-0) and separates formal languages by their grammatical complexity. A grammar in this context consists of terminal and non-terminal symbols, production rules, and a designated start symbol, enabling the generation of valid strings within a language.
形式语言 在形式语言理论和语言学中,语言的结构遵循乔姆斯基层次结构,该结构根据语法结构的复杂性对语言进行分类(乔姆斯基,1956)。该层次结构定义了四种类型的语法(类型-3到类型-0),并根据语法复杂性将形式语言分开。在此上下文中,语法由终结符和非终结符符号、产生规则以及指定的起始符号组成,从而能够在语言中生成有效的字符串。
We define a NeSy engine as a mapping VS:Ω×N×T ,where N is a set of non-terminal symbols, T is a set of terminal symbols and NT= ,and S is a starting symbol. We further formalize a grammar G=(N,T,P,S) with production rules defined as a P:=VS(ω,N,T) . This grammar describes the generation of symbols through expressions ω . For simplicity,we will drop the subscript of VS and use it as V . We identify LLMs as promising candidates for functioning as part of NeSy engines. In SymbolicAI,a symbol S is augmented with conditional instructions and types derived from DSLs, custom defined or not (e.g. HTML, SQL, etc.), tailored for directing the LLMs. The key advantage of LLMs over previous systems lies in their ability to generalize across formal languages (Wang et al., 2023a) and knowledge systems. Although there is currently no universal consensus regarding the precise classification of natural language within the Chomsky hierarchy, our approach can be understood as employing a situation-specific, context-sensitive grammar, which enables the processing of instructions and analogies with a nuanced understanding of language. The intersection between formal and natural languages becomes evident when considering how language patterns, through prompts like "You are a helpful assistant...", elicit structured responses, indicating a potential underlying formal mechanism at play. This observation underlines the utility of such a grammar in our framework, where it serves as an explicit schema guiding the structure of examples for in-context learning. For instance,equating "3.1415..." with " π " or "August 4,1961" with "1961-08-04" in a given context demonstrates context-dependent interpretation of symbols. Such a system doesn't rigidly adhere to standard grammatical rules but instead adjusts and interprets based on the context, effectively creating a situation-specific gram-
我们将 NeSy 引擎定义为一个映射 VS:Ω×N×T,其中 N 是一组非终结符, T 是一组终结符, NT=S 是一个起始符。我们进一步形式化一个文法 G=(N,T,P,S),其产生规则定义为 P:=VS(ω,N,T)。该文法描述了通过表达式 ω 生成符号的过程。为简化起见,我们将省略 VS 的下标,并将其用作 V。我们将 LLM 识别为有希望的候选者,能够作为 NeSy 引擎的一部分发挥作用。在 SymbolicAI 中,符号 S 被增强为包含来自 DSL 的条件指令和类型,无论是自定义定义的还是非自定义定义的(例如 HTML、SQL 等),以指导 LLM 的操作。LLM 相较于之前的系统的关键优势在于其能够在形式语言(Wang et al., 2023a)和知识系统之间进行泛化。尽管目前对于自然语言在乔姆斯基层级中的精确分类尚无普遍共识,但我们的方法可以理解为采用一种特定情境的、上下文敏感的文法,这使得能够以细致的语言理解处理指令和类比。当考虑到语言模式如何通过诸如“你是一个有帮助的助手……”这样的提示引发结构化响应时,形式语言和自然语言之间的交集变得显而易见,这表明潜在的形式机制在起作用。这一观察强调了这种文法在我们框架中的实用性,它作为一个明确的模式,指导上下文学习的示例结构。例如,在给定的上下文中,将“3.1415...”等同于“ π ”或将“1961年8月4日”与“1961-08-04”相等,展示了符号的上下文依赖解释。这样的系统并不严格遵循标准语法规则,而是根据上下文进行调整和解释,有效地创建了一种特定情境的文法。

3 Our framework’s name is derived from the foundational work of Newell and Simon.
3 我们框架的名称源自纽厄尔和西蒙的基础性工作。

Figure 2: Illustration for NeSy pipeline, showcasing conceptual usage of in-context learning methodologies, domain-specific language (DSL) structures, and the expression evaluations through a NeSy engine based on an LLM and constraint verification. The expression showcases the sorted insert operator and how the information of the symbol B is included in the symbol AC. The violet placeholder in the DSL Prompt represents an instruction, such as "Insert the right-hand side value into the left-hand value in ascending order:" The positions below represent task-specific few-shot examples. The DSL Prompt receives the expression ω<< and maps it to ω^<< that can be processed by the LLM-based NeSy function VS and outputs a new symbol.
图 2:NeSy 管道的示意图,展示了上下文学习方法、特定领域语言 (DSL) 结构的概念性使用,以及通过基于 LLM 的 NeSy 引擎进行的表达式评估。该表达式展示了排序插入操作符 以及符号 B 的信息如何包含在符号 AC 中。DSL 提示中的紫色占位符表示一个指令,例如“将右侧值按升序插入到左侧值中:”下面的位置代表特定任务的少量示例。DSL 提示接收表达式 ω<< 并将其映射到可以由基于 LLM 的 NeSy 函数 VS 处理的 ω^<<,并输出一个新符号。
mar, capable of forming Domain-Invariant Associations through in-context learning. We further address this in a later paragraph.
mar,能够通过上下文学习形成领域不变的关联。我们将在后面的段落中进一步讨论这一点。
Function Composition In SymbolicAI, we use function composition to construct complex hierarchies and behaviors from fundamental elements. Therefore, our framework enables modeling of interconnected processes, where the output of one function is used as input for another, thus creating a sequence of operations. Through function composition, we construct computational graphs, in which intermediate symbols represent the nodes or states within these graphs. Formally,function composition is denoted by ,where combining functions f and g yields a new function h=gf ,defined as h(x)=g(f(x)) For functions f:XY and g:YZ ,their composition results in a function mapping elements from domain X to codomain Z through g(f(x)) . Although traditionally the codomain of any subset relationship between these domains and codomains, which is particularly beneficial for in-context learning. When using LLMs for NeSy production rules V ,we can derive a multi-step generative process by composing a computational graph as a sequence of zero- and few-shot function compositions:
函数组合 在符号人工智能中,我们使用函数组合从基本元素构建复杂的层次结构和行为。因此,我们的框架能够建模相互关联的过程,其中一个函数的输出作为另一个函数的输入,从而创建一系列操作。通过函数组合,我们构建计算图,其中中间符号表示这些图中的节点或状态。形式上,函数组合用 表示,其中组合函数 fg 产生一个新函数 h=gf,定义为 h(x)=g(f(x))。对于函数 f:XYg:YZ,它们的组合结果是一个将元素从域 X 映射到余域 Z 的函数,通过 g(f(x))。尽管传统上这些域和余域之间的任何子集关系的余域是有益的,特别是在上下文学习中。当使用大型语言模型进行神经符号生产规则 V 时,我们可以通过将计算图组合为零-shot 和少量-shot 函数组合的序列,推导出一个多步骤生成过程。
(1)V(ωj,N,T)=V(ωj1,)V(ωj2,)V(ω0,),
where ω0 is the initial instruction and j defines the index variable for a multi-step generative process. By leveraging functional in-context learning, where zero- and few-shot examples act as dynamic elements of the function's domain, SymbolicAI has the ability to interpret and respond to diverse input contexts. For instance, a function can classify a user request and select an appropriate interface (e.g. WolframAlpha) to process the request. The output modality may even vary based on the respective engine. This enables SymbolicAI to handle operations over multi-modal data that connects multi-step generative processes and establishes function composition as a central tenet in bridging multiple modalities and coordinating a variety of tasks.
其中 ω0 是初始指令,而 j 定义了多步骤生成过程的索引变量。通过利用功能性上下文学习,其中零-shot 和少量-shot 示例作为函数域的动态元素,符号人工智能能够解释和响应多样的输入上下文。例如,一个函数可以对用户请求进行分类,并选择一个合适的接口(例如 WolframAlpha)来处理请求。输出模式甚至可能根据各自的引擎而有所不同。这使得符号人工智能能够处理跨多模态数据的操作,连接多步骤生成过程,并将函数组合确立为连接多种模态和协调各种任务的核心原则。
Domain-Invariant Associations In-context learning enabled LLMs to become versatile task solvers by interpolating within the training distribution, to the extent that even potentially unseen tasks are addressable (Brown et al., 2020). We attribute this to associations formed within the input space and the capacity of Transformer architectures for defining domain-invariant feature sub-spaces. This phenomenon has parallels with few-shot learning approaches such as SubGD (Gauch et al., 2022), a method based on identifying and utilizing a low-dimensional subspace, learned from various tasks that effectively regularize the learning process. Since LLMs have been trained on different domains and tasks, which also include formulations of mathematical expressions, we posit that specific tokens, such as the equality sign, can be leveraged to associate meanings between different symbolic objects. Unlike domain-invariant representations that create invariant features across different learning tasks, our approach leverages the in-context generalization capability of LLMs to construct invariant symbolic associations that aim to preserve, manipulate and propagate situational context. We can use these properties to build operations that apply transformations on objects that are substitutes to the semantically aligned few-shot learning examples.
域不变关联 在上下文学习的支持下,LLM 能够通过在训练分布内进行插值,成为多功能的任务求解者,以至于即使是潜在未见的任务也可以被处理(Brown et al., 2020)。我们将此归因于在输入空间内形成的关联以及 Transformer 架构定义域不变特征子空间的能力。该现象与少样本学习方法如 SubGD(Gauch et al., 2022)有相似之处,该方法基于识别和利用从各种任务中学习到的低维子空间,有效地规范化学习过程。由于 LLM 已在不同领域和任务上进行训练,其中也包括数学表达式的形式,我们假设特定的标记,例如等号,可以被利用来关联不同符号对象之间的意义。与在不同学习任务中创建不变特征的域不变表示不同,我们的方法利用 LLM 的上下文泛化能力构建不变的符号关联,旨在保持、操作和传播情境上下文。我们可以利用这些属性构建对对象进行变换的操作,这些对象是语义上对齐的少样本学习示例的替代品。

5 SYMBOLICAI FRAMEWORK
5 SYMBOLICAI 框架

In this section, we discuss the specifics of the proposed SymbolicAI framework. For more details about the framework structure, see Appendix Section C. For installation and usage of our framework, see Appendix Section D. For more technical details and code snippets, see Appendix Section E.
在本节中,我们讨论所提出的 SymbolicAI 框架的具体细节。有关框架结构的更多详细信息,请参见附录 C 节。有关我们框架的安装和使用,请参见附录 D 节。有关更多技术细节和代码片段,请参见附录 E 节。
Types and Representations Analogous to the type object in Python, the base type of SymbolicAI is a symbol represented by the base type Symbol. All other subtypes, such as Expression, represent their mathematical namesake and can be evaluated and simplified. These subtypes inherit from Symbol the base attributes, primitive operators, and helper methods.
类型和表示 类似于 Python 中的类型对象,SymbolicAI 的基本类型是由基本类型 Symbol 表示的符号。所有其他子类型,如 Expression,表示其数学名称,并且可以被评估和简化。这些子类型继承了 Symbol 的基本属性、原始操作符和辅助方法。
Although SymbolicAI uses a language-centric design, modeling and manipulating every interaction into symbolic representations is not inherently efficient. Therefore, we establish mappings between symbolic and sub-symbolic representations for sensory inputs and non-discrete elements. Such mappings are typically realized through function approximation. This allows us to map between modality-to-language and language-to-modality use cases. Here, modality serves as a placeholder for various types such as text, image, video, audio, motion, etc. In turn, each Symbol object contains valued and vector-valued representations, obtained through value and embedding attributes. The latter represents a symbol's current value, akin to embedding text and storing it as a PyTorch tensor (Paszke et al., 2019) or NumPy array (Harris et al., 2020). While for an LLM, the numerical tensors may lack inherent meaning, vector-valued representations play an important role when 1) composite symbols are combined into more complex expressions, and 2) these embedded tensors are updated through gradient-based optimization.
尽管符号人工智能采用以语言为中心的设计,但将每个交互建模和操纵为符号表示并不是固有的高效。因此,我们为感官输入和非离散元素建立符号与子符号表示之间的映射。这种映射通常通过函数逼近实现。这使我们能够在模态到语言和语言到模态的用例之间进行映射。在这里,模态作为文本、图像、视频、音频、运动等各种类型的占位符。反过来,每个符号对象包含值和向量值表示,这些表示通过值和嵌入属性获得。后者表示符号的当前值,类似于将文本嵌入并存储为PyTorch张量(Paszke等,2019)或NumPy数组(Harris等,2020)。虽然对于大型语言模型(LLM)而言,数值张量可能缺乏固有意义,但向量值表示在1)复合符号组合成更复杂的表达式时,以及2)这些嵌入张量通过基于梯度的优化进行更新时,发挥着重要作用。
To enable the processing of symbols by LLMs, we assume that each Symbol object implements Python's native string functionality,where the "_str_ method returns an interpretable string representation. Therefore, we can assert that any Python object is parsable by an LLM, however, the user must ensure a meaningful representation. For more details, see Appendix Section E.
为了使大型语言模型(LLMs)能够处理符号,我们假设每个符号对象实现了Python的原生字符串功能,其中“_str_”方法返回可解释的字符串表示。因此,我们可以断言任何Python对象都可以被LLM解析,然而,用户必须确保有意义的表示。有关更多详细信息,请参见附录E节。
Polymorphic Context Polymorphism is a central concept in programming language theory and prominently featured in SymbolicAI. Polymorphism refers to the ability of different objects to be accessed through the same interface, or of a single identifier to represent different types based on the context of execution. Providing a single interface for entities of different types allows operations to be performed in ways specific to their derived types. We therefore designed the Symbol object to contain a global context, which is composed of static and dynamic context parts, and enables this polymorphic behavior. The static context is class dependent and defined at design time. The dynamic context is runtime adaptable and can be changed to adhere to runtime specific logic and changes. Moreover, Symbol associated operations are resolved following polymorphic design before being evaluated by the NeSy engine. Symbol-icAI's engine implementation contains a prepare method to resolve and compile the engine specific representation by evaluating the Symbol-specific operations and context. For an example on polymorphic context see part a) in Figure 3.
多态上下文 多态性是编程语言理论中的一个核心概念,并在符号人工智能中占据重要地位。多态性指的是不同对象能够通过相同接口进行访问的能力,或者单一标识符能够根据执行上下文表示不同类型的能力。为不同类型的实体提供单一接口,使得可以以特定于其派生类型的方式执行操作。因此,我们设计了符号对象,以包含一个由静态和动态上下文部分组成的全局上下文,从而实现这种多态行为。静态上下文依赖于类,并在设计时定义。动态上下文在运行时可适应,并可以根据运行时特定逻辑和变化进行更改。此外,符号相关的操作在被NeSy引擎评估之前,按照多态设计进行解析。Symbol-icAI的引擎实现包含一个准备方法,通过评估符号特定的操作和上下文来解析和编译引擎特定的表示。有关多态上下文的示例,请参见图3的a)部分。
Operators and Methods In SymbolicAI, operators are overloaded to facilitate transformations of Symbol objects. These operator primitives employ dynamic casting to ensure type compatibility. Consequently, Symbol objects can be easily manipulated through type specific attributions or symbolically evaluated by the NeSy engine. For example, primarily adhere to the type specific implementation, because we prioritize strict comparisons over probabilistic evaluations. If the evaluation was unsuccessful, we then consider semantic equality through the NeSy engine. SymbolicAI leverages decorators for composing operators and custom class methods. For more details, see Appendix Section C.
运算符和方法 在符号人工智能中,运算符被重载以促进符号对象的转换。这些运算符原语采用动态类型转换以确保类型兼容性。因此,符号对象可以通过特定类型的属性轻松操作,或通过NeSy引擎进行符号评估。例如,主要遵循特定类型的实现,因为我们优先考虑严格比较而非概率评估。如果评估不成功,我们会通过NeSy引擎考虑语义相等性。符号人工智能利用装饰器来组合运算符和自定义类方法。有关更多详细信息,请参见附录C节。
Upon invoking an operator or method, the respective primitive function evaluates the symbol's specific type and its respective attributes, and if necessary, resolves a nested decorated function that then uses the NeSy engine for evaluation. Should the evaluation fail, a predefined fallback implementation executes. Absent a fallback, or if both evaluations fail, an error state is raised. The processing of an operator or custom method involves a pipeline consisting of pre- and post-processing steps, as well as constraint enforcement. Constraints cover aspects like return types, value ranges, and structural integrity (e.g. JSON formatting through grammar-based verification). In Figure 3 b) we give an overview of the entire prompt composition based on the user input, the Symbol object structure, and in part c) the Symbol evaluation pipeline.
在调用操作符或方法时,相应的原始函数会评估符号的特定类型及其相应属性,并在必要时解析一个嵌套的装饰函数,该函数随后使用 NeSy 引擎进行评估。如果评估失败,则执行预定义的回退实现。如果没有回退,或者如果两个评估都失败,则会引发错误状态。操作符或自定义方法的处理涉及一个管道,该管道包括前处理和后处理步骤,以及约束执行。约束涵盖返回类型、值范围和结构完整性(例如,通过基于语法的验证进行 JSON 格式化)。在图 3 b) 中,我们提供了基于用户输入、符号对象结构的整个提示组成的概述,在部分 c) 中则展示了符号评估管道。
Self-Referential Structures SymbolicAI augments the generative process by enabling systems to introspect and modify their behavior dynamically. We leverage LLMs to execute tasks based on both natural and formal language instructions, adhering to the specified user objectives and with innate self-referential structures. We derive subtypes from Expression and enclose their functionalities in task-specific components, which we then expose again through
自指结构 SymbolicAI 通过使系统能够动态地自省和修改其行为来增强生成过程。我们利用大型语言模型(LLMs)根据自然语言和形式语言指令执行任务,遵循指定的用户目标,并具备内在的自指结构。我们从表达式派生子类型,并将其功能封装在特定任务的组件中,然后再次通过这些组件暴露出来。
Figure 3: a) Illustration of polymorphic context on the example of a SQLExpression type for the add-operator. Without a polymorphic context a regular Expression evaluation concatenates two Symbol objects together. The polymorphic context in SQLExpression overwrites the base behavior such that two added SQL-expressions get semantically combined, not concatenated. b) Illustration of the translation of a Symbol object to a prompt statement to be processed by an LLM in the NeSy engine. The User Input Args can be attached with a Payload from previous executions and gets applied to the Custom Method. The user input with the polymorphic context of the Symbol Object attributes (Static Context and Dynamic Context) are translated to a prompt statement according to the schema of the Prompt Design. The fields Operation, Examples and Template mark operation description, DSL-based prompt examples and template structures respectively. These translations are processed according to PreProcessor and engine-specific formatting. c) Illustrates the evaluation pipeline from user input to output, with multiple translation processes before and after the Engine invocation. The Input gets passed to the Custom Method and reformatted according to a PreProcessor to adhere to DSL-specific structure. The engine then takes the output of the PreProcessor and composes the final prompt according to the engine-specific Prompt Design and resolves polymorphic context and auxiliary fields. The output of the Engine then can be restructured by a Post Processor to match DSL-requirements of the desired Output and gets applied Constraints to verify the outcome.
图3:a) 以 SQLExpression 类型的加法运算符为例,展示多态上下文的插图。没有多态上下文时,常规的表达式评估将两个符号对象连接在一起。SQLExpression 中的多态上下文覆盖了基本行为,使得两个相加的 SQL 表达式在语义上被组合,而不是简单地连接在一起。b) 展示符号对象转换为提示语句以供 NeSy 引擎中的 LLM 处理的插图。用户输入参数可以附带来自先前执行的有效负载,并应用于自定义方法。用户输入与符号对象属性的多态上下文(静态上下文和动态上下文)被翻译为根据提示设计的模式生成的提示语句。操作、示例和模板字段分别标记操作描述、基于 DSL 的提示示例和模板结构。这些翻译根据预处理器和引擎特定格式进行处理。c) 说明从用户输入到输出的评估管道,在引擎调用之前和之后有多个翻译过程。输入被传递给自定义方法,并根据预处理器重新格式化,以遵循特定于 DSL 的结构。然后引擎获取预处理器的输出,并根据引擎特定的提示设计组合最终提示,同时解析多态上下文和辅助字段。引擎的输出可以通过后处理器重新结构化,以符合所需输出的 DSL 要求,并应用约束以验证结果。
templating and the model-driven design of the NeSy engine. This design choice allows a system to create and utilize its own sub-process definitions, analogous to concepts discussed in Schmidhuber (2007; 2009). Concretely, we utilize generalization properties of LLMs to interpret and formulate a set of operations that incorporate self-instructions (Wang et al., 2022). Consequently, the operations hold the flexibility to adapt to the context, and derive sub-processes that self-instruct LLMs to engage in situational modeling and context-sensitive problem-solving. Ultimately, this enables the construction of hierarchical computational graphs for self-referential meta-reasoning systems without the need to explicitly training a meta-learner (Kirsch & Schmidhuber, 2022). In Figure 4 we illustrate a step-wise evaluation of a contextual computational graph, in which the NeSy engine is processing conditioned on the current execution context and producing a next symbol prediction.
模板化和NeSy引擎的模型驱动设计。这一设计选择允许系统创建和利用其自身的子过程定义,类似于Schmidhuber(2007; 2009)中讨论的概念。具体而言,我们利用大型语言模型(LLMs)的泛化特性来解释和制定一组操作,这些操作包含自我指令(Wang et al., 2022)。因此,这些操作具有适应上下文的灵活性,并推导出自我指令LLMs参与情境建模和上下文敏感问题解决的子过程。最终,这使得构建自指元推理系统的层次计算图成为可能,而无需显式训练元学习者(Kirsch & Schmidhuber, 2022)。在图4中,我们展示了上下文计算图的逐步评估,其中NeSy引擎根据当前执行上下文进行处理,并生成下一个符号预测。

6 PERFORMANCE MEASURE
6 性能测量

One of the challenges when creating multi-step generative processes with LLMs as part of NeSy engines relies on model evaluation and handling irrelevant predictions. The naive assessment that measures only task succession would score all models to zero and render them as unusable. Even if models follow instructions and produce parts of the expected solution, we regularly observe that they - especially open-source models - append a continuation of task irrelevant predictions. Such predictions result in failure modes when applying conditions and validations, and halt any more nuanced diagnostics and the possibility of continuing the evaluation despite intermediate failures. To derive our quality measure, we borrow ideas from the utilization of the Fréchet distance for generative processes (Heusel et al.,
在将LLMs作为NeSy引擎的一部分创建多步骤生成过程时,面临的挑战之一是模型评估和处理无关预测。仅仅测量任务连续性的简单评估将所有模型的得分都归为零,使其变得不可用。即使模型遵循指令并生成预期解决方案的部分,我们经常观察到它们——尤其是开源模型——附加了与任务无关的预测的延续。这些预测在应用条件和验证时导致失败模式,并阻碍任何更细致的诊断以及尽管存在中间失败仍继续评估的可能性。为了推导我们的质量测量,我们借鉴了在生成过程中使用Fréchet距离的思想(Heusel et al.,
2017).
Figure 4: We showcase a multi-step hierarchical computational graph, with each node in the graph represented by a symbol. The edges are relations between symbols. The left-hand side illustrates how a new node (Symbol 3) is obtained by evaluating an operation with its respective context on a NeSy engine. The right-hand side illustrates the context information window (yellow rectangle) and relationship of the resulting graph with its respective nodes.
图4:我们展示了一个多步骤的层次计算图,图中的每个节点由一个符号表示。边表示符号之间的关系。左侧展示了如何通过在NeSy引擎上评估一个操作及其相应上下文来获得一个新节点(符号3)。右侧展示了上下文信息窗口(黄色矩形)以及生成图与其相应节点的关系。
We generate trajectories through a NeSy sequential process that creates a trajectory of distributions P over multiple desired behavior. To quantify the validity of the generated trajectories, we measure the total distance between th generated and reference data distribution along the path trajectory. We therefore adopt a cumulative measure capable of taking into account the entire generative trajectory. In theory, this process would entail calculating the path integral over the latent space representations for models, cumulating the Fréchet distances (Dowson & Landau, 1982) traversed
我们通过NeSy顺序过程生成轨迹,该过程在多个期望行为上创建分布的轨迹 P。为了量化生成轨迹的有效性,我们测量生成的和参考数据分布沿路径轨迹的总距离。因此,我们采用一种累积度量,能够考虑整个生成轨迹。从理论上讲,这一过程将涉及计算模型的潜在空间表示上的路径积分,累积经过的Fréchet距离(Dowson & Landau, 1982)。
along these trajectories:
沿着这些轨迹:
(2)D(Pgen ,Pref )=t0tfd(N(mt,Ct),N(mw,t,Cw,t))dt
where D(Pgen ,Pref ) denotes the integral of the Fréchet distances between two data distributions along the generative path trajectory from an initial time t0 to a final time tf,d(N(mt,Ct),N(mw,t,Cw,t)) is the Fréchet distance calculated at each time t between the generated multivariate normal data distribution with mean mt and covariance Ct , and the reference multivariate normal data distribution with mean mw,t and covariance Cw,t . The resulting measure follows properties of normal distributions and is consistent with increasing disturbances.
其中 D(Pgen ,Pref ) 表示沿着从初始时间 t0 到最终时间 tf,d(N(mt,Ct),N(mw,t,Cw,t)) 的生成路径轨迹之间两个数据分布的Fréchet距离的积分 t 是在每个时间点计算的Fréchet距离,生成的多元正态数据分布的均值为 mt,协方差为 Ct,而参考的多元正态数据分布的均值为 mw,t,协方差为 Cw,t。所得度量遵循正态分布的特性,并与增加的干扰一致。
However, this approach is computationally intractable for large-scale problems, and requires access to latent representations, which - especially in the context of LLMs - is not always given. For computational feasibility, we introduce an approximation that measures the embedding distances over the path trajectories through an auxiliary embedding model, based on prior work on distribution regression (Szabó et al., 2016). The embedding model maps the symbolic representations into a RKHS, such that we can apply a kernel mean embedding function to measure their respective distances (You et al., 2019; Dinu et al., 2023). We assess the distance through the mean embeddings w.r.t. to a kernel function K(,) of the samples extνgen tPgen  and eytνref tPref  produced by the generated data distribution and a reference data distribution respectively. We denote by μext,μeyt the mean embeddings associated to the respective samples,i.e. μext(z)=1ni=1nK(xit,z) in case ext=(xit)i=1n is a sample of size n of the respective mean embed-dings. To compute the similarity between the embeddings of the generated and reference distributions, we evaluate the associated maximum mean discrepancy MMD2(μext,μeyt) (Gretton et al.,2012) and then,as before for the Fréchet
然而,这种方法在大规模问题上计算上是不可行的,并且需要访问潜在表示,这在大型语言模型(LLMs)的上下文中并不总是可得。为了实现计算的可行性,我们引入了一种近似方法,通过辅助嵌入模型测量路径轨迹上的嵌入距离,这基于先前关于分布回归的工作(Szabó et al., 2016)。嵌入模型将符号表示映射到再生核希尔伯特空间(RKHS),使我们能够应用核均值嵌入函数来测量它们各自的距离(You et al., 2019; Dinu et al., 2023)。我们通过相对于样本 extνgen tPgen eytνref tPref  生成的数据分布及参考数据分布的核函数 K(,) 的均值嵌入来评估距离。我们用 μext,μeyt 表示与各自样本相关的均值嵌入,即在 ext=(xit)i=1n 是各自均值嵌入的大小为 n 的样本的情况下。为了计算生成分布和参考分布的嵌入之间的相似性,我们评估相关的最大均值差异 MMD2(μext,μeyt)(Gretton et al., 2012),然后像之前对 Fréchet 的处理一样。
distances,we integrate over t :
距离,我们在 t 上进行积分:
(3)D~(Pgen ,Pref )=t0tfMMD2(μext,μeyt)dt.
the properties of the MMD and derive a similarity score, which follows the same statistical principles as the MMD, and is bound between [0,1] . We concluded that we can utilize only the MMD cross terms to evaluate the similarities. See Appendix Section B for more details. For our comparisons as referenced in Figure 6 we therefore denote the similarities rather than distances. We then come to the following formulation and refer to our empirical measure as the "Vector Embedding for Relational Trajectory Evaluation through Cross-similarity", or VERTEX score for short:
MMD 的性质并推导出一个相似性评分,该评分遵循与 MMD 相同的统计原理,并且范围在 [0,1] 之间。我们得出结论,只有 MMD 交叉项可以用于评估相似性。有关更多详细信息,请参见附录 B 节。因此,在我们在图 6 中提到的比较中,我们表示相似性而不是距离。然后我们得出以下公式,并将我们的经验度量称为“通过交叉相似性进行关系轨迹评估的向量嵌入”,简称为 VERTEX 评分:
(4)s(Pgen ,Pref ):=t0tf[min(max(0,1zMMD~2(μext,μeyt)zrand ),1)]dt.
We approximate the integral across time steps through Monte Carlo approximation. The introduced normalization constants denote the similarities to a random sequence zrand  ,which functions as a baseline subtraction to recenter our results,and a given reference score to rescale w.r.t. to scores obtained from comparing related solutions z . Min-max scaling ensures the final measure is bounded between [0,1] . This process reflects properties such as Hölder continuity that bounds the kernel function within certain limits. To compute the embeddings, we utilize the embedding model all-mpnet-base-v2 (Song et al., 2020), due to its widespread availability, and its balance between speed and quality. As a similarity measure, we select a Gaussian kernel following our derivation from the Appendix Section B. In our implementations, we also explore other kernels, including preliminary experiments with cosine similarity. We also note that one can integrate Bernoulli distributed trials into our score, with 0 values representing failure modes and values of 1 being successes. Furthermore, if we relax our definition, we can integrate other similarity measures which are bound between [0,1] ,which then reflect on domain-specific attributions,i.e. including a similarity measure tailored towards capturing the nuances between two sub-structures of abstract syntax tree.
我们通过蒙特卡洛近似法来近似时间步长上的积分。引入的归一化常数表示与随机序列的相似性 zrand ,该序列作为基线减法来重新中心我们的结果,并且给定的参考分数用于相对于从比较相关解决方案中获得的分数进行重新缩放 z。最小-最大缩放确保最终度量在 [0,1] 之间有界。这个过程反映了如 Hölder 连续性等性质,这些性质在某些限制内界定了核函数。为了计算嵌入,我们利用了嵌入模型 all-mpnet-base-v2 (Song et al., 2020),因为它的广泛可用性以及在速度和质量之间的平衡。作为相似性度量,我们选择了高斯核,依据我们在附录 B 部分的推导。在我们的实现中,我们还探索了其他核,包括与余弦相似度的初步实验。我们还注意到,可以将伯努利分布的试验整合到我们的分数中,其中 0 值表示失败模式,1 值表示成功。此外,如果我们放宽定义,可以整合其他在 [0,1] 之间有界的相似性度量,这些度量反映了特定领域的属性,即包括一种旨在捕捉抽象语法树两个子结构之间细微差别的相似性度量。

7 EVALUATION
7 评估

We introduce a benchmark that evaluates multi-step generative processes as complex workflows. Our benchmark consists of five different evaluation categories, and uses the VERTEX score to measure the capabilities of an LLM to solve tasks from each category. The five categories of our benchmark are divided into three basic evaluations and two tion which measures a models proficiency in understanding associations between symbols, (ii) multi-modal binding where we evaluate the capability to employ tools and operate on different modalities, and (iii) program synthesis for measuring a models proficiency in generating consistent code and executing subroutines. The two advanced benchmark categories are (iv) logic, for evaluating logic-based components and (v) computational graphs where complex workflows need to be processed, evaluating all aforementioned capabilities. For our evaluation we focus on the GPT family (Brown et al., 2020) of models, specifically GPT-3.5 Turbo (revision 1106) and GPT-4 Turbo (revision 1106) as they are the most proficient models to date; Gemini-Pro (Google, 2023) as the best performing model available through API from Google; LLaMA2-Chat 13B (Touvron et al., 2023), LLaMA3-Chat 8B and LLaMA3-Chat 70B 2023) serve as baselines for revised and fine-tuned open-source models. The open-source models Mistral, Zephyr, and smaller LLaMA variants are estimated to have roughly equivalent parameter counts compared to GPT-3.5 Turbo and Gemini-Pro. All our experiments require a context size smaller or equal to 4096 to enable the comparisons among the in-context capabilities across model architectures. For the LLaMA models, we utilize the chat versions since they are specifically tuned to follow instructions.
我们介绍了一个基准,评估多步骤生成过程作为复杂工作流。我们的基准由五个不同的评估类别组成,并使用 VERTEX 分数来衡量大型语言模型(LLM)解决每个类别任务的能力。我们的基准的五个类别分为三个基本评估和两个高级评估,其中 (i) 符号关联理解能力的评估,(ii) 多模态绑定能力的评估,我们评估使用工具和操作不同模态的能力,以及 (iii) 程序合成,用于衡量模型生成一致代码和执行子例程的能力。两个高级基准类别是 (iv) 逻辑,用于评估基于逻辑的组件,以及 (v) 计算图,其中需要处理复杂工作流,评估所有上述能力。对于我们的评估,我们专注于 GPT 家族(Brown et al., 2020)的模型,特别是 GPT-3.5 Turbo(修订版 1106)和 GPT-4 Turbo(修订版 1106),因为它们是迄今为止最优秀的模型;Gemini-Pro(Google, 2023)作为通过 Google API 提供的最佳表现模型;LLaMA2-Chat 13B(Touvron et al., 2023)、LLaMA3-Chat 8B 和 LLaMA3-Chat 70B(2023)作为修订和微调的开源模型的基准。开源模型 Mistral、Zephyr 和较小的 LLaMA 变体的参数数量估计与 GPT-3.5 Turbo 和 Gemini-Pro 大致相当。我们所有的实验都要求上下文大小小于或等于 4096,以便在模型架构之间进行上下文能力的比较。对于 LLaMA 模型,我们使用聊天版本,因为它们经过专门调优以遵循指令。
Associative Prediction We evaluate a model's proficiency to follow simple and complex instructions and associations with zero- and few-shot examples. Therefore, we evaluate the proficiency in applying our operators between Symbol types. We defined a total of 15 tasks involving in-context associations between two Symbol instances. SymbolicAI's overloaded operators rely on predefined pseudo-grammars, as described in Section 4, that augment the operators with few-shot examples. For instance, the overloaded operator + utilized between two Symbol instances provides few-shot examples how to resolve additions with various data types. Consequently, we can now test if the models can solve the addition between Symbol ("two hundred and thirty four") and Symbol (7000). See Appendix Section F. 1 for more details.
关联预测 我们评估模型遵循简单和复杂指令及其与零样本和少量样本示例的关联能力。因此,我们评估在符号类型之间应用我们操作符的能力。我们定义了总共15个任务,涉及两个符号实例之间的上下文关联。SymbolicAI的重载操作符依赖于预定义的伪语法,如第4节所述,这些伪语法通过少量示例增强了操作符。例如,在两个符号实例之间使用的重载操作符 + 提供了如何使用各种数据类型解决加法的少量示例。因此,我们现在可以测试模型是否能够解决符号(“二百三十四”)和符号(7000)之间的加法。有关更多详细信息,请参见附录F.1节。
Multi-modal Binding We perform transformations between multiple modalities through language-based representations. Therefore, we need to evaluate the model's proficiency in tool utilization, classification and routing of requests to relevant modules. We define a multi-modal Expression to detect the category of a task based on its content and to forward the task to the appropriate tool. The expression creates interfaces to tools like WolframAlpha for mathematical expressions, Selenium for website content scraping, SerpApi for search queries, and APILayer for optical character recognition. Each of the five tests aims to evaluate the appropriate handling of a specific type of input by the multi-modal Expression type, such as processing a website URL for scraping, interpreting a search engine query, testing if two vectors are linearly independent, comparing large numbers, and extracting text from an image. See Appendix Section F. 2 for more details.
多模态绑定 我们通过基于语言的表示在多种模态之间进行转换。因此,我们需要评估模型在工具利用、请求分类和路由到相关模块方面的能力。我们定义了一个多模态表达式,以根据任务的内容检测其类别,并将任务转发到适当的工具。该表达式创建了与工具的接口,如用于数学表达式的WolframAlpha、用于网站内容抓取的Selenium、用于搜索查询的SerpApi,以及用于光学字符识别的APILayer。五个测试的每一个旨在评估多模态表达式类型对特定类型输入的适当处理,例如处理用于抓取的网站URL、解释搜索引擎查询、测试两个向量是否线性独立、比较大数字,以及从图像中提取文本。有关更多详细信息,请参见附录F.2节。
Program Synthesis We evaluate executable code with and without concepts from retrieval augmented generation, model-driven development, and experiment with self-generated instructions by creating self-referential expressions. We designed three separate tests related to program synthesis, where each task assesses the ability of the models to generate and execute code based on natural language instructions or provided templates:
程序合成 我们评估可执行代码,包括和不包括检索增强生成、模型驱动开发的概念,并通过创建自指表达式来实验自生成的指令。我们设计了三个与程序合成相关的独立测试,每个任务评估模型根据自然语言指令或提供的模板生成和执行代码的能力:
1) The first task involves reading a LaTeX table template and data, then generating a function to populate the table with the given data.
1) 第一个任务涉及读取 LaTeX 表格模板和数据,然后生成一个函数来用给定的数据填充表格。
2) The second task tests the automatic code generation for API calls by fetching data from a specified URL and extracting specific information from the retrieved content.
2) 第二个任务测试通过从指定 URL 获取数据并提取检索内容中的特定信息来自动生成 API 调用的代码。
3) The third task evaluates the ability to construct a custom Expression that processes a Symbol through a specific Function component from the SymbolicAI package.
3) 第三个任务评估构建自定义表达式的能力,该表达式通过 SymbolicAI 包中的特定函数组件处理符号。
Each of the three tests follows a similar pattern, where the generated code is scored based on its similarity to valid references and normalized with random samples. See Appendix Section F. 3 for more details.
三个测试遵循类似的模式,生成的代码根据其与有效参考的相似性进行评分,并与随机样本进行归一化。有关更多细节,请参见附录 F.3 节。
Logical Components To evaluation the capabilities for logical reasoning of models, we condition them to create a sequence of expressions as self-contained components, and refer to higher-order logic for their assessment. Based on the underlying type theory originating from Whitehead & Russell (1925-1927), we evaluate a models' capability to resolve statements in the form of there exists x such that x fulfills y . Such quantifiers define the standard semantics of expressions, where their meaning is given by a semantic function. A semantic function maps a term from an abstract definition to a point in a domain, which is an interpretation of the term's type and value. Therefore, these functions operate on types and values of expressions, and relations thereof. Subsequently, NeSy engines can formulate and evaluate at inference time logic-based instructions through Lisp, Prolog, or Mathematica (McCarthy, 1959; Colmerauer & Roussel,1993; Chen et al.,1993; Inc.,2022), or leverage solvers such as Z3 (Moura & Bjørner, 2008). Therefore, the result of a natural language statement when evaluated by a NeSy engine can be interpreted by any expert system which defines the corresponding semantic functions and process them either in a symbolic (Feigenbaum et al., 1965; Gamble et al., 1994), differentiable (Veličković & Blundell, 2021; Ibarz et al., 2022), or hybrid manner (Kuncicky et al., 1991).
逻辑组件 为了评估模型的逻辑推理能力,我们将其条件化以创建一系列作为自包含组件的表达式,并参考高阶逻辑进行评估。基于源自怀特海和拉塞尔(1925-1927)的基础类型理论,我们评估模型解决以“存在某个 x 使得 x 满足 y ”形式的陈述的能力。这种量词定义了表达式的标准语义,其意义由语义函数给出。语义函数将一个术语从抽象定义映射到域中的一个点,这个点是该术语类型和价值的解释。因此,这些函数在表达式的类型和值及其关系上进行操作。随后,NeSy 引擎可以通过 Lisp、Prolog 或 Mathematica(麦卡锡,1959;科尔梅劳尔和鲁塞尔,1993;陈等,1993;公司,2022)在推理时制定和评估基于逻辑的指令,或利用如 Z3(穆拉和比约尔纳,2008)这样的求解器。因此,当 NeSy 引擎评估自然语言陈述的结果时,可以被任何定义相应语义函数并以符号(费根鲍姆等,1965;甘布尔等,1994)、可微分(维利克诺维奇和布伦德尔,2021;伊巴兹等,2022)或混合方式(昆西基等,1991)处理的专家系统解释。
We evaluate how proficient models are at interpreting custom DSLs and define expression statements. DSLs are designed to express logical relations and operations in a structured format, and supports human-readable and machine-statement into an expression statement, as follows:
我们评估模型在解释自定义 DSL(领域特定语言)和定义表达语句方面的熟练程度。DSL 旨在以结构化格式表达逻辑关系和操作,并支持人类可读和机器可陈述的表达语句,如下所示:
Marvins has four paws and likes to meow when I pet its fur. Is Marvins a cat?
马文有四只爪子,当我抚摸它的毛发时喜欢喵喵叫。马文是一只猫吗?
A DSL may enforce the usage of HAS (),IS() ,etc. and may condition an LLM to produce the following expressions:
一种领域特定语言(DSL)可能会强制使用 HAS (),IS() 等,并可能使大型语言模型(LLM)产生以下表达式:
  • HasFourPaws (x):x has four paws.
  • HasFourPaws (x):x 有四只爪子。
  • LikesToMeowWhenPetted (x):x likes to meow when it is petted.
  • LikesToMeowWhenPetted (x):x 喜欢在被抚摸时喵喵叫。
  • IsCat (x):x is a cat.
  • IsCat (x):x 是一只猫。
These are then utilized to define the following logical expression:
这些表达式随后用于定义以下逻辑表达式:
x(HasFourPaws(x) LikesToMeowWhenPetted (x)IsCat(x)).
An automated theorem prover can now evaluate this statement for all values of x and assess the validity of the original query. Lastly, our evaluation uses symbolic mathematics to manipulate algebraic expressions. This involves defining symbols and performing operations like factorization, simplification, and algebraic manipulation. The symbols are placeholders for any value, enabling the definition of general expressions without specifying their values upfront.
自动定理证明器现在可以评估该语句在所有 x 的值下的有效性,并评估原始查询的有效性。最后,我们的评估使用符号数学来操纵代数表达式。这涉及到定义符号并执行因式分解、简化和代数操作等操作。这些符号是任何值的占位符,使得可以定义一般表达式而无需提前指定它们的值。
We designed six tests to assess the logical capabilities of the candidate models and group them as follows. See Appendix Section F. 4 for more details.
我们设计了六个测试来评估候选模型的逻辑能力,并将其分组如下。有关更多详细信息,请参见附录 F. 4 节。
1) We utilize the Python library SymPy for symbolic mathematics to create the mathematical expression ax+bx cxayby+cy+d . The task for the model is then to factorize the expression and extract all unique symbols as a
1) 我们利用 Python 库 SymPy 进行符号数学,以创建数学表达式 ax+bx cxayby+cy+d。模型的任务是因式分解该表达式并提取所有唯一符号作为
2) Three tasks evaluate a models' capability to resolve the logical operations AND, OR, and XOR. For instance, the test for logical AND combines the symbols Symbol ("The horn only sounds on Sundays") and Symbol ("I hear the horn") and compares the answer against the human-generated references "The horn of possible solutions, there is high variability in the solution space. Each model might prefer a different solution.
2) 三个任务评估模型解决逻辑运算 AND、OR 和 XOR 的能力。例如,逻辑 AND 的测试结合了符号 Symbol(“喇叭只在星期天响”)和符号 Symbol(“我听到喇叭声”),并将答案与人类生成的参考“可能解决方案的喇叭”进行比较,解决方案空间中存在高度变异性。每个模型可能偏好不同的解决方案。
3) For another task we use a custom Expression that defines a DSL syntax and semantic structure. We use this Expression to extract higher-order logic expressions from a natural language statement, namely the puzzle 'Who is Jay’s brother?’ 4 ,that preserves the original relationships.
3) 在另一个任务中,我们使用自定义表达式定义 DSL 语法和语义结构。我们使用该表达式从自然语言语句中提取高阶逻辑表达式,即谜题“杰伊的兄弟是谁?” 4,以保持原始关系。
4) For the final task, we again use the puzzle 'Who is Jay's brother?' to evaluate a models' capability for complex conversions. We use the Z3 theorem prover (Moura & Bjørner, 2008) to solve the 'Who is Jay's brother' puzzle conditioned on the Z3 solvers' solution to Einsteins' famous puzzle 'Who owns the fish?'. The task involves an indirect translation from natural language to executable code by the Z3 solver; the solution to Einstein's puzzle acts as a form of self-contained "documentation" for how the Z3 solver should be utilized. The test constructs a template, which includes the task instructions, puzzle statement, and reference to the Einstein's puzzle solution. The models are then asked to analyze the given problem and solution format and create a Python function with Z3 syntax that can solve the 'Who is Jay's brother?' puzzle. The dynamically generated code is executed within the test environment utilizing Python's exec function. We check the access to the Z3 solver and run the generated solve_puzzle function supposed to contain the logic to solve the puzzle. Once executed, the assembled Z3 logical clauses are processed by the solver, which verifies that the set of constraints is satisfiable. If so, the model generated by the solver is queried for the puzzle's solution and scored using our VERTEX score.
4) 对于最终任务,我们再次使用谜题“谁是杰伊的兄弟?”来评估模型在复杂转换方面的能力。我们使用 Z3 定理证明器 (Moura & Bjørner, 2008) 来解决“谁是杰伊的兄弟”谜题,该谜题以 Z3 求解器对爱因斯坦著名谜题“谁拥有鱼?”的解答为条件。该任务涉及从自然语言到可执行代码的间接翻译,由 Z3 求解器完成;爱因斯坦谜题的解答作为一种自包含的“文档”,说明 Z3 求解器应如何使用。测试构建了一个模板,其中包括任务说明、谜题陈述和对爱因斯坦谜题解答的引用。然后,要求模型分析给定的问题和解答格式,并创建一个具有 Z3 语法的 Python 函数,以解决“谁是杰伊的兄弟?”的谜题。动态生成的代码在测试环境中利用 Python 的 exec 函数执行。我们检查对 Z3 求解器的访问,并运行生成的 solve_puzzle 函数,该函数应包含解决谜题的逻辑。一旦执行,组装的 Z3 逻辑子句将由求解器处理,求解器验证约束集是否可满足。如果可满足,则查询求解器生成的模型以获取谜题的解答,并使用我们的 VERTEX 评分进行评分。
Hierarchical Computational Graphs We evaluate the capabilities of models to orchestrate a multi-step generative process and evaluate a set of tasks. Models need to direct sub-processes and associate computational results from and to Symbol nodes, and maintain relationships between these nodes, which we refer to as a computational graph as shown in Figure 5. In a computational graph, the VERTEX score compares the results produced by a generative model at each node against samples obtained from a reference distribution, usually modeled by sampling from multiple valid references. We also account for randomness through predefined random samples for normalizing the result. Our reference to hierarchical computational graphs stems from the fact that we operate on multiple levels. On a higher level of abstraction we are able to perform planning, sub-task scheduling, and define operational instructions. On a lower level of abstraction, we execute these plans based on the defined instructions and data, which can also span generative processes that produce new information.
层次计算图 我们评估模型在协调多步骤生成过程和评估一组任务方面的能力。模型需要指导子过程,并将计算结果与符号节点进行关联,同时维护这些节点之间的关系,我们称之为计算图,如图5所示。在计算图中,VERTEX评分比较生成模型在每个节点产生的结果与从参考分布中获得的样本,通常通过从多个有效参考中采样来建模。我们还通过预定义的随机样本来考虑随机性,以规范化结果。我们对层次计算图的引用源于我们在多个层次上操作的事实。在更高的抽象层次上,我们能够进行规划、子任务调度,并定义操作指令。在较低的抽象层次上,我们根据定义的指令和数据执行这些计划,这些数据也可以跨越生成过程,产生新信息。
Figure 5: We illustrate the hierarchical computational graph for the Paper expression. Each node represents an instance of an expression with distinct properties and behaviors, such as file sourcing, generative process, tool utilization, or transformation operation. The edges denote the reference relationships between expressions and indicate the flow of information. The blue highlighted nodes mark the main sequence nodes of expressions utilized to create parts of the paper draft, such as Method section, RelatedWork section, Abstract section, and so on. Each generative node is used for evaluating the VERTEX score. None-generative nodes such as search engine results are not evaluated, and we assume to obtain ground-truth values.
图5:我们展示了论文表达的层次计算图。每个节点代表具有不同属性和行为的表达实例,例如文件来源、生成过程、工具利用或转换操作。边缘表示表达之间的参考关系,并指示信息流动。蓝色高亮的节点标记了用于创建论文草稿部分的主要序列节点,例如方法部分、相关工作部分、摘要部分等。每个生成节点用于评估VERTEX评分。非生成节点,如搜索引擎结果,则不进行评估,我们假设获得真实值。
Given that the field is currently at an early stage in developing even sequential schedulers for LLM-based planning systems, our evaluations will be confined to sequential execution only. We introduce two tests designed to evaluate multi-step generative processes:
鉴于该领域目前在为基于LLM的规划系统开发顺序调度器方面仍处于早期阶段,我们的评估将仅限于顺序执行。我们引入了两个旨在评估多步骤生成过程的测试:
1) We simulate and evaluate the process of writing a research paper draft based on a predefined hierarchical computational graph that focuses on the content output of the computational graph rather than planning and scheduling functionality. See Appendix Section F. 5 for more details.
1) 我们基于预定义的层次计算图模拟和评估撰写研究论文草稿的过程,该计算图侧重于计算图的内容输出,而不是规划和调度功能。有关更多细节,请参见附录 F. 5 节。
2) We test the VERTEX Protocol as defined in Algorithm 1, which represents our general method for evaluating multi-step generative processes. We create a self-contained test scenario to illustrate an end-to-end evaluation and as a go-to reference for how our protocol can be deployed in a realistic environment. Our evaluation protocol is not only designed to analyze and score a series of instructions, but also to provide a structured basis for recording these
2) 我们测试算法 1 中定义的 VERTEX 协议,该协议代表我们评估多步骤生成过程的一般方法。我们创建一个自包含的测试场景,以说明端到端评估,并作为我们协议在现实环境中部署的参考。我们的评估协议不仅旨在分析和评分一系列指令,还提供记录这些过程的结构化基础。此外,我们注意到我们的评估协议是一般性制定的,这允许应用非顺序的规划和调度。算法 1 VERTEX 协议

4 Bob has two sons,John and Jay. Jay has one brother and father. The father has two sons. Jay’s brother has a brother and a father. Who is Jay's brother?
4 Bob 有两个儿子,John 和 Jay。Jay 有一个兄弟和父亲。父亲有两个儿子。Jay 的兄弟有一个兄弟和一个父亲。Jay 的兄弟是谁?

processes. Furthermore, we note that our evaluation protocol is generally formulated, which allows the application of non-sequential planning and scheduling. Algorithm 1 VERTEX Protocol
过程。此外,我们注意到我们的评估协议是一般性制定的,这允许应用非顺序的规划和调度。算法 1 VERTEX 协议
Require: NeSy engine V as an LLM,embedding engine E:HRd ,symbols {x0,x,y} ,with x0 as the initial instruction, x as the payload resulted from executing V,y as the reference,and acting as a placeholder for P,T,C ,capabilities C={F1,F2,F3,} ,where each Fi represents a specific functional role within the system,plan P ,task TP ,memory buffer M ,a scoring function s~:H×H[0,1] ,a scheduler Q , an aggregator A ,and score variables {s}[0,1] . Method: 1: V,E,Q,C,yPINIT() Initialize engines,scheduler,capabilities,expected plan. 2: M,A Initialize memory buffer and aggregator. 3: xPGENERATEPLAN(x0,V) V generates plan based on initial instruction. EVALUATE(xP,yP,E,A,s~) Embed,score,and aggregate plan similarity. P,MUNFOLDPLAN(yP,M,Q) Q unfolds plan into actionable tasks and updates progression. while P do Run until list of tasks is exhausted. T,yC,yTSELECT(M,V) V selects next task based on task progression. FiIDENTIFY(T,C,V) V identifies task-related capability Fi . xC,xTEXECUTE(T,Fi,Q) Q executes T with capability Fi and assign results xC,xT . EVALUATE (xC,yC,xT,yT,E,A,s~) Embed,score,and aggregate capability similarity. P,MUPDATE(T,P,M,Q) Q updates plan and progression. end while sFINALIZE(A) Finalize aggregation of scores. return s Return aggregated score of plan execution.
需求:NeSy 引擎 V 作为一个大型语言模型,嵌入引擎 E:HRd,符号 {x0,x,y},以 x0 作为初始指令,x 作为执行 V,y 结果的有效载荷,以及 作为 P,T,C 的占位符,能力 C={F1,F2,F3,},其中每个 Fi 代表系统内的特定功能角色,计划 P,任务 TP,内存缓冲区 M,评分函数 s~:H×H[0,1],调度器 Q,聚合器 A,以及评分变量 {s}[0,1]。 方法:1:V,E,Q,C,yPINIT() 初始化引擎、调度器、能力、预期计划。 2:M,A 初始化内存缓冲区和聚合器。 3:xPGENERATEPLAN(x0,V) V 根据初始指令生成计划。EVALUATE(xP,yP,E,A,s~) 嵌入、评分并聚合计划相似性。P,MUNFOLDPLAN(yP,M,Q) Q 将计划展开为可执行任务并更新进展。 当 P 执行 时,直到任务列表耗尽。T,yC,yTSELECT(M,V) V 根据任务进展选择下一个任务。FiIDENTIFY(T,C,V) V 确定与任务相关的能力 FixC,xTEXECUTE(T,Fi,Q) Q 使用能力 Fi 执行 T 并分配结果 xC,xT。 评估 (xC,yC,xT,yT,E,A,s~) 嵌入、评分并聚合能力相似性。P,MUPDATE(T,P,M,Q) Q 更新计划和进展。 结束当 sFINALIZE(A) 最终汇总评分。 返回 s 返回计划执行的聚合评分。
Algorithm 1: This algorithm defines the pseudocode of our VERTEX protocol with our respective VERTEX score as a scoring criteria. We start by initializing the NeSy engine V ,the embedding engine E ,the scheduler Q ,and a set of capabilities C . The initial instruction x0 is utilized to generate a plan xP through V . The plan and its expected outcome yP are embedded,and their similarity is scored according to our VERTEX score and aggregated. The plan is then unfolded into actionable tasks. Each task T is selected and executed with the appropriate capability C ,resulting in the capability and task results xC,xT ,and expected outcomes yC,yT updated in the memory buffer M . The process continues, with each task's result being embedded, scored, and aggregated until the plan is complete. The final aggregated score s is returned,reflecting the overall effectiveness of the plan execution.
算法 1:该算法定义了我们 VERTEX 协议的伪代码,并以我们的 VERTEX 分数作为评分标准。我们首先初始化 NeSy 引擎 V、嵌入引擎 E、调度器 Q 和一组能力 C。初始指令 x0 被用于通过 V 生成计划 xP。该计划及其预期结果 yP 被嵌入,并根据我们的 VERTEX 分数进行相似性评分和聚合。然后将计划展开为可执行任务。每个任务 T 被选择并以适当的能力 C 执行,导致能力和任务结果 xC,xT 以及预期结果 yC,yT 在内存缓冲区 M 中更新。该过程持续进行,每个任务的结果被嵌入、评分和聚合,直到计划完成。最终聚合分数 s 被返回,反映计划执行的整体有效性。
We start with a high-level workflow description which consists of a list of tasks and optionally their respective subtasks; we refer to this as the plan P . To perform the experiment,we utilize an expected plan yP which was handcrafted for this evaluation. The expected plan is a queue of predefined tasks (in a particular order) that the system should follow to achieve the goal. The goal statement defines the end objective that the workflow aims to accomplish. We also have a set of plans similar to the expected plan,which are trajectories in the solution space,as well as the plan xP that the LLM generates utilizing the GENERATEPLAN call for a specific seed. We score the predicted plan against the expected plan and the trajectories, then we continue to the next phase in which we utilize the expected plan to execute the tasks. At each step, the LLM will receive in its context the goal, the tasks, the current progress, and a query asking for the next task to execute; we refer to this as the memory buffer M . If the LLM is not able to predict the next task, it will return a failure, and the expected plan will be utilized to execute the next task. The LLM has access to a predefined set of capabilities C ,specifically WolframAlpha,SerpApi,Selenium,and the LLM itself,which also represents our self-referential structure. We keep executing tasks until the queue is exhausted, and at each step, we utilize the EVALUATE call to measure the performance of the LLM with our VERTEX score. The scheduler class Q oversees the execution of the test workflow. It takes the setup configuration and orchestrates the linear execution of tasks, utilizing the expected plan as a reference. It maintains a pool of tasks to be executed and updates progress as tasks are completed. The UNFOLDPLAN call is a method of the scheduler class Q . The method calls itself recursively until there are no tasks left. The SELECT call is responsible for determining which task to execute next from a pool of remaining tasks. It utilizes the LLM through self-reflection (Shinn et al., 2023) to choose the most suitable next task based on a template that gets progressively updated in the memory buffer M by the UPDATE call. The IDENTIFY call uses self-reflection and similarity scoring to determine the best interface based on the task at hand, then passes aggregated assessment of the model's ability to manage and execute the workflow.
我们首先提供一个高层次的工作流程描述,其中包含任务列表及其各自的子任务(可选);我们称之为计划 P。为了进行实验,我们利用一个为本次评估精心设计的预期计划 yP。预期计划是一个预定义任务的队列(按照特定顺序),系统应遵循这些任务以实现目标。目标声明定义了工作流程旨在完成的最终目标。我们还有一组与预期计划类似的计划,它们是解空间中的轨迹,以及 LLM 为特定种子生成的计划 xP。我们将预测的计划与预期计划和轨迹进行评分,然后继续到下一阶段,在该阶段我们利用预期计划来执行任务。在每一步中,LLM 将在其上下文中接收目标、任务、当前进度以及询问下一个要执行的任务的查询;我们称之为记忆缓冲区 M。如果 LLM 无法预测下一个任务,它将返回失败,预期计划将被用来执行下一个任务。LLM 可以访问一组预定义的能力 C,具体包括 WolframAlpha、SerpApi、Selenium 以及 LLM 本身,这也代表了我们的自指结构。我们持续执行任务,直到队列耗尽,并且在每一步中,我们利用 EVALUATE 调用来测量 LLM 的表现,使用我们的 VERTEX 评分。调度器类 Q 负责监督测试工作流程的执行。它接受设置配置并协调任务的线性执行,利用预期计划作为参考。它维护一个待执行任务的池,并在任务完成时更新进度。UNFOLDPLAN 调用是调度器类 Q 的一种方法。该方法递归调用自身,直到没有剩余任务。SELECT 调用负责从剩余任务池中确定下一个要执行的任务。它通过自我反思 (Shinn et al., 2023) 来选择最合适的下一个任务,基于一个在记忆缓冲区 M 中通过 UPDATE 调用逐步更新的模板。IDENTIFY 调用利用自我反思和相似性评分来确定基于当前任务的最佳接口,然后传递模型管理和执行工作流程能力的综合评估。
In Figure 6 we conclude with our evaluation and compute the cumulative score for all described evaluation categories and in the next section we discuss how to interpret the results of our framework.
在图6中,我们总结了我们的评估,并计算了所有描述的评估类别的累积得分,在下一节中我们讨论如何解读我们框架的结果。
Figure 6: We evaluate GPT-4 Turbo, GPT-3.5 Turbo, Gemini-1.0 Pro, LLaMA2-Chat 13B, LLaMA3-Chat 8B, LLaMA3-Chat 70B, Mistral 7B and Zephyr 7B on five benchmark categories: 1) Associative Prediction (Association) 2) Multi-modal Binding (Modality) 3) Program Synthesis (Code) 4) Functional Logic Components (Logic) and 5) Hierarchical Computational Graphs (Graphs). We denote the VERTEX scores for each category as a normalized value between 0 and 1 , where higher values are better. The VERTEX score is measured according to a reference baseline and normalized by random sequences to exclude noise and similarities among references distributions to rescale solutions. The shown scores are an average over all tests per category and across 8 different seeds per test.
图6:我们在五个基准类别上评估了GPT-4 Turbo、GPT-3.5 Turbo、Gemini-1.0 Pro、LLaMA2-Chat 13B、LLaMA3-Chat 8B、LLaMA3-Chat 70B、Mistral 7B和Zephyr 7B:1)关联预测(Association)2)多模态绑定(Modality)3)程序合成(Code)4)功能逻辑组件(Logic)和5)层次计算图(Graphs)。我们将每个类别的VERTEX得分表示为0到1之间的标准化值,其中较高的值更好。VERTEX得分是根据参考基线测量的,并通过随机序列进行标准化,以排除噪声和参考分布之间的相似性,从而重新缩放解决方案。所示得分是每个类别在所有测试中的平均值,并且每个测试跨8个不同的种子。
BenchmarksGPT-4 TurboGPT-3.5 TurboGemini 1.0 ProLLaMA 213BLLaMA 3 8BLLaMA 370BMistral 7BZephyr 7BRandom
Associations0.940.510.930.400.460.570.460.270.08
Modality0.830.650.810.500.430.630.390.430.07
Code0.750.630.610.130.250.300.190.130.00
Logic0.480.460.280.460.210.110.530.470.00
Graphs0.390.340.260.060.050.150.030.030.00
Total0.680.520.580.310.280.350.320.270.03
基准GPT-4 TurboGPT-3.5 TurboGemini 1.0 ProLLaMA 213BLLaMA 3 8BLLaMA 370BMistral 7BZephyr 7B随机
关联0.940.510.930.400.460.570.460.270.08
模态0.830.650.810.500.430.630.390.430.07
代码0.750.630.610.130.250.300.190.130.00
逻辑0.480.460.280.460.210.110.530.470.00
0.390.340.260.060.050.150.030.030.00
总计0.680.520.580.310.280.350.320.270.03

8 DISCUSSION
8 讨论

In this section, we address the evaluation results, auxiliary findings and limitations of SymbolicAI and the future directions we are focusing on. Some of the limitations stem from the inherent constraints of current technologies and dependencies on third-party systems. Additionally, the nuanced complexities of working with generative models presents further challenges.
在本节中,我们讨论评估结果、辅助发现和SymbolicAI的局限性,以及我们关注的未来方向。一些局限性源于当前技术的固有限制和对第三方系统的依赖。此外,使用生成模型的细微复杂性带来了进一步的挑战。

8.1 RESULTS
8.1 结果

In Figure 6 we show the VERTEX score for all five evaluation categories on 8 different state-of-the-art models. We show the aggregated results per category, meaning the average score among all tests averaged per category and average across 8 different seeds per test. The VERTEX score is normalized between 0 and 1 , where higher values are better. Our score is non-linear due to its nature of using non-linear kernels, and captures semantic, ordinal and relative structures among the data samples. However, since our score is highly dependent on the quality of the underlying embedding model, it may omit to capture fine-grained syntactic differences such as 'Hello' vs 'hello'.
在图6中,我们展示了8种不同的最先进模型在五个评估类别上的VERTEX得分。我们展示了每个类别的汇总结果,这意味着在每个类别中所有测试的平均得分,以及每个测试在8个不同种子下的平均得分。VERTEX得分在0到1之间归一化,其中较高的值表示更好。由于使用非线性核的特性,我们的得分是非线性的,能够捕捉数据样本之间的语义、序数和相对结构。然而,由于我们的得分高度依赖于基础嵌入模型的质量,它可能会忽略捕捉细微的句法差异,例如“Hello”和“hello”之间的区别。
In our experiments, we have noticed that for associative predictions and multi-modal bindings, GPT-4 Turbo is on par with Gemini-1.0 Pro. Furthermore, there is still a large gap between open-source contestants such as LLaMA 3 even with 70B parameters compared to the closed-source alternatives from OpenAI and Google. For the rest of the experiments, we see that GPT-4 almost always outperforms all other models, except for the functional logic components category. Here, we analyzed results and found that the larger models sometimes take shortcuts by automatically returning the solution and answering that the task instructions are too complex for such a straight-forward puzzle query. However, we would rather state in general that for logic-based, planning and scheduling tasks all models act unreliably, even if slight performance differences between the models are seen in the plot. We believe this is in part due to lack of training data specifically for workflows, planning and scheduling tasks, and to imprecision in generating reliably structured output formats, such as custom DSLs or other custom in-context instructed formats. This also stems from their instruction fine-tuning, since most models are chat-based models and offer verbose responses which need to be suppressed or post-processed.
在我们的实验中,我们注意到在关联预测和多模态绑定方面,GPT-4 Turbo与Gemini-1.0 Pro不相上下。此外,即使是具有70B参数的开源竞争者如LLaMA 3,与OpenAI和Google的闭源替代品相比,仍然存在很大差距。在其余的实验中,我们发现GPT-4几乎总是优于其他所有模型,除了功能逻辑组件类别。在这里,我们分析了结果,发现较大的模型有时通过自动返回解决方案来走捷径,并回答任务指令对于如此简单的谜题查询来说过于复杂。然而,我们更愿意一般性地指出,对于基于逻辑的规划和调度任务,所有模型的表现都不可靠,即使在图中可以看到模型之间的轻微性能差异。我们相信,这在一定程度上是由于缺乏专门针对工作流、规划和调度任务的训练数据,以及在生成可靠结构化输出格式(如自定义DSL或其他自定义上下文指令格式)方面的不精确。这也源于它们的指令微调,因为大多数模型是基于对话的模型,提供冗长的响应,这需要被抑制或后处理。
We see similar performance between GPT-3.5 Turbo and LLaMA 3 70B except for the logical and graphs evaluations. We found that LLaMA 3 70B has a tendency to ask questions back if it does not understand the request instead of following the specified instructions provided. We assume this also stems from the chat-based instruction fine-tuning. Zephyr 7B and Mistral 7B have shown on par capabilities in functional logic components with larger models, however
我们观察到 GPT-3.5 Turbo 和 LLaMA 3 70B 之间的性能相似,除了逻辑和图形评估之外。我们发现 LLaMA 3 70B 如果不理解请求,倾向于反问,而不是遵循提供的具体指令。我们认为这也源于基于聊天的指令微调。Zephyr 7B 和 Mistral 7B 在功能逻辑组件方面表现出与更大模型相当的能力,但在程序合成和层次计算图实验中表现不佳。
fail in program synthesis and hierarchical computational graphs experiments. We observe that they perform well when resolving the overloaded logic operators such as OR, AND and XOR, and show decent performance for text generation, but fail to resolve more complex instructions.
我们观察到它们在解决过载逻辑运算符(如 OR、AND 和 XOR)时表现良好,并且在文本生成方面表现尚可,但在解决更复杂的指令时失败。

8.2 LIMITATIONS
8.2 限制

Framework Since the framework interfaces with many tools and API services, it requires a substantial engineering although we support a variety of tools and frameworks like Selenium, WolframAlpha, or Z3, we only scratch the surface of these tools. Moreover, the utilization of grammar-based constraints validations is still experimental and limited in functionality for specific formats such as JSON and HTML. Finally, we encounter also challenges related to engineering parallelization and multiprocessing of prompts, since the concurrent execution is non-trivial, especially with intricacies of Python process management.
框架由于该框架与许多工具和 API 服务接口,因此需要大量的工程工作。尽管我们支持多种工具和框架,如 Selenium、WolframAlpha 或 Z3,但我们仅仅触及这些工具的表面。此外,基于语法的约束验证的使用仍然是实验性的,并且在特定格式(如 JSON 和 HTML)方面功能有限。最后,我们还面临与提示的工程并行化和多进程相关的挑战,因为并发执行并非易事,尤其是在 Python 进程管理的复杂性下。
Embedding Measure Our empirical measure is limited by the expressiveness of the embedding model and how well it captures the nuances in similarities between two representations. Furthermore, the obtained similarity scores are highly non-linear and difficult to interpret. For instance, two representations may address the same topic, such as the problem description and its respective solution, however, when measuring their similarity we obtain similarity scores of 70% . We normalize this by subtracting an inherent baseline and randomness effect,however,to ensure a more holistic and robust measurement we would need a significantly larger amount of baselines and experiments. Since we were very limited in the availability of development resources, and some presented models are only addressable through costly API walls. We are actively seeking sponsors to scale our solution and offer a more compelling
嵌入度量 我们的经验度量受限于嵌入模型的表达能力以及它捕捉两个表示之间相似性的细微差别的能力。此外,获得的相似性分数高度非线性且难以解释。例如,两个表示可能涉及相同的主题,例如问题描述及其相应的解决方案,然而,在测量它们的相似性时,我们获得的相似性分数为 70%。我们通过减去固有的基线和随机性效应来进行归一化,然而,为了确保更全面和稳健的测量,我们需要显著更多的基线和实验。由于我们在开发资源的可用性上非常有限,并且一些呈现的模型仅通过昂贵的 API 墙可访问。我们正在积极寻求赞助商来扩展我们的解决方案并提供更具吸引力的。
Model Capabilities An obvious limitation revolves around the fixed context window size of the underlying language models. Despite the expansion of the context window in newer models such as GPT-4, the finite context still restricts the amount of data that can be processed in a single pass. All information outside the context needs to be added through information retrieval approaches, which come with their own challenges and limitations (Gao et al., 2023). This leads to side effects, including hallucination, given the model does not contain the necessary information to answer the prompted instruction, which makes it difficult to maintain long-term statefulness for complex reasoning tasks and computational graphs.
模型能力 一个明显的限制是底层语言模型的固定上下文窗口大小。尽管在较新的模型(如 GPT-4)中扩展了上下文窗口,但有限的上下文仍然限制了在单次传递中可以处理的数据量。所有上下文之外的信息需要通过信息检索方法添加,这些方法自身也带来了挑战和限制(Gao et al., 2023)。这导致了副作用,包括幻觉,因为模型不包含回答提示指令所需的信息,这使得在复杂推理任务和计算图中保持长期状态变得困难。
Error Handling The complexity of error handling when evaluating complex expressions through function composi-tionality, especially between multiple modalities and different solvers, is another notable challenge. While SymbolicAI introduces mechanisms for error analysis and automated correction, these approaches are not infallible. They are often limited by the quality and expressiveness of the models, and the model's capacity to understand deeply nested logical constructs. We also note that for our evaluations, we disabled any remedy protocol, such as truncating prompts or retry schema.
错误处理 通过函数组合性评估复杂表达式时,错误处理的复杂性,尤其是在多种模态和不同求解器之间,是另一个显著的挑战。虽然 SymbolicAI 引入了错误分析和自动修正的机制,但这些方法并非万无一失。它们通常受到模型质量和表达能力的限制,以及模型理解深度嵌套逻辑结构的能力。我们还注意到,在我们的评估中,我们禁用了任何补救协议,例如截断提示或重试方案。
Generalization This research is also limited by current LLM's capacity for reasoning and generalization. Although progress has been made, models are still prone to hallucinations and reasoning errors, especially when dealing with abstract, novel, or highly complex problem statements (Marcus, 2020). Furthermore, our framework's reliance on the model's ability to grasp the semantics of operations can be influenced by the training data and the model's innate biases and conceptual understanding (Mahowald et al., 2023). We also point out that the initial development of SymbolicAI started with the GPT family of models, and we may encounter innate biases in prompt design and expressiveness when utilizing other reference models. However, we also point out that prompt engineering instruction-based statements is not a reliable direction for improvement. We instead advocate for enhancing the resilience of models through fault is how to assess the disentanglement of evaluations of models on downstream tasks, to avoid testing on training samples, especially for closed-source solutions like GPT.
泛化 本研究也受到当前大型语言模型(LLM)推理和泛化能力的限制。尽管已经取得了一定进展,但模型仍然容易出现幻觉和推理错误,特别是在处理抽象、新颖或高度复杂的问题陈述时(Marcus, 2020)。此外,我们框架对模型理解操作语义能力的依赖可能受到训练数据以及模型固有偏见和概念理解的影响(Mahowald et al., 2023)。我们还指出,符号人工智能(SymbolicAI)的初始发展始于GPT系列模型,在使用其他参考模型时,我们可能会遇到提示设计和表达能力方面的固有偏见。然而,我们也指出,基于指令的提示工程声明并不是一个可靠的改进方向。我们更提倡通过故障的弹性来增强模型的韧性,评估模型在下游任务上的解耦,以避免在训练样本上进行测试,特别是对于像GPT这样的闭源解决方案。
Interpretability and Transparency Finally, the issue of explainability and transparency in AI systems remains challenging. While SymbolicAI makes steps towards making computational processes more explicit and explainable problem. This can hinder trust and adoption in sensitive applications where interpretability of predictions is important.
可解释性和透明性 最后,人工智能系统中的可解释性和透明性问题仍然具有挑战性。尽管符号人工智能在使计算过程更加明确和可解释方面迈出了步伐,但这可能会妨碍在对预测的可解释性要求较高的敏感应用中的信任和采用。

8.3 FUTURE WORK
8.3 未来工作

The goal for Algorithm 1 is to be utilized by an advanced learning agent. This agent, employing reinforcement learning methodologies (Ouyang et al., 2022; Li et al., 2023; Rafailov et al., 2023), could leverage our evaluation measure in existing benchmarks (Milani et al., 2020; Swazinna et al., 2022; Schweighofer et al., 2022) as a means to obtain reward signals to addresses a central problem in reinforcement learning, namely credit assignment (Sutton, 1984; Arjona-Medina et al., 2019; Holzleitner et al., 2020; Patil et al., 2020; Widrich et al., 2021; Dinu et al., 2022). Over time, it aims to develop the ability to autonomously generate its own plans, efficiently schedule tasks and subtasks, and carefully select the most suitable tools for each task. Our protocol lays the groundwork for this agent to learn and expand its base set of capabilities (Amaro et al., 2023), moving towards more sophisticated, self-referential orchestration of multi-step tasks. We've already noticed that research is shifting towards this type of methodology (Yuan et al., 2024). Furthermore, in Section 7 we've only considered a sequential scheduler. However, our objective is to ultimately assess a non-sequential task execution model, allowing for dynamic insertion and out-of-sequence task execution. In addition, we are interested into exploring similarities of our work with Generative Flow Networks (Bengio et al., 2021a;b; Lahlou et al., 2023). Lastly, we also discuss limitations in Appendix Section 8.2 with further opportunities for future improvements.
算法 1 的目标是被一个先进的学习代理所利用。该代理采用强化学习方法(Ouyang et al., 2022; Li et al., 2023; Rafailov et al., 2023),可以利用我们在现有基准中的评估指标(Milani et al., 2020; Swazinna et al., 2022; Schweighofer et al., 2022)作为获取奖励信号的手段,以解决强化学习中的一个核心问题,即信用分配(Sutton, 1984; Arjona-Medina et al., 2019; Holzleitner et al., 2020; Patil et al., 2020; Widrich et al., 2021; Dinu et al., 2022)。随着时间的推移,它旨在发展自主生成自身计划的能力,高效安排任务和子任务,并仔细选择每个任务最合适的工具。我们的协议为该代理学习和扩展其基本能力集奠定了基础(Amaro et al., 2023),朝着更复杂的自我参考多步骤任务编排迈进。我们已经注意到研究正在向这种方法论转变(Yuan et al., 2024)。此外,在第 7 节中,我们仅考虑了一个顺序调度器。然而,我们的目标是最终评估一个非顺序任务执行模型,允许动态插入和非顺序任务执行。此外,我们还希望探索我们的工作与生成流网络(Bengio et al., 2021a;b; Lahlou et al., 2023)之间的相似性。最后,我们在附录第 8.2 节中讨论了局限性以及未来改进的进一步机会。

9 CONCLUSION
9 结论

In this work, we introduced SymbolicAI, a framework that unifies generative models with an array of solvers, blending the strengths of symbolic and sub-symbolic AI paradigms within a cohesive NeSy framework. SymbolicAI equips researchers and practitioners with a comprehensive toolkit to develop contextualized and explainable NeSy AI systems capable of addressing diverse challenges effectively. We also introduce a quality measure and a benchmark for comparing and evaluating a wide range of computational tasks. SymbolicAI provides a basis for further research in advanced program synthesis, hierarchical computational graphs, the development of self-referential systems, and the integration of probabilistic methods with AI design for creating autonomous agents.
在本工作中,我们介绍了SymbolicAI,这是一个将生成模型与一系列求解器统一的框架,融合了符号和亚符号人工智能范式的优势,形成一个统一的NeSy框架。SymbolicAI为研究人员和实践者提供了一个全面的工具包,以开发能够有效应对多样化挑战的上下文化和可解释的NeSy人工智能系统。我们还引入了一种质量度量和基准,用于比较和评估广泛的计算任务。SymbolicAI为进一步研究高级程序合成、分层计算图、自指系统的发展以及将概率方法与人工智能设计相结合以创建自主代理提供了基础。

ACKNOWLEDGEMENT
致谢

The ELLIS Unit Linz, the LIT AI Lab, the Institute for Machine Learning, are supported by the Federal State Upper Austria. We thank the projects Medical Cognitive Computing Center (MC3), INCONTROL-RL (FFG-881064), PRIMAL (FFG-873979), S3AI (FFG-872172), DL for GranularFlow (FFG-871302), EPILEPSIA (FFG-892171), AIRI FG 9-N (FWF-36284, FWF-36235), AI4GreenHeatingGrids (FFG- 899943), INTEGRATE (FFG-892418), ELISE (H2020-ICT-2019-3 ID: 951847), Stars4Waters (HORIZON-CL6-2021-CLIMATE-01-01). We thank Audi.JKU Deep Learning Center, TGW LOGISTICS GROUP GMBH, Silicon Austria Labs (SAL), FILL Gesellschaft mbH, Anyline GmbH, Google, ZF Friedrichshafen AG, Robert Bosch GmbH, UCB Biopharma SRL, Merck Healthcare KGaA, Verbund AG, GLS (Univ. Waterloo), Software Competence Center Hagenberg GmbH, Borealis AG, TÜV Austria, Frauscher Sensonic, TRUMPF, the NVIDIA Corporation and Atlas.
ELLIS 单元林茨、LIT AI 实验室、机器学习研究所得到了上奥地利联邦州的支持。我们感谢医疗认知计算中心(MC3)、INCONTROL-RL(FFG-881064)、PRIMAL(FFG-873979)、S3AI(FFG-872172)、DL for GranularFlow(FFG-871302)、EPILEPSIA(FFG-892171)、AIRI FG 9-N(FWF-36284,FWF-36235)、AI4GreenHeatingGrids(FFG-899943)、INTEGRATE(FFG-892418)、ELISE(H2020-ICT-2019-3 ID: 951847)、Stars4Waters(HORIZON-CL6-2021-CLIMATE-01-01)等项目的支持。我们感谢 Audi.JKU 深度学习中心、TGW LOGISTICS GROUP GMBH、Silicon Austria Labs (SAL)、FILL Gesellschaft mbH、Anyline GmbH、Google、ZF Friedrichshafen AG、Robert Bosch GmbH、UCB Biopharma SRL、Merck Healthcare KGaA、Verbund AG、GLS(滑铁卢大学)、软件能力中心哈根贝格 GmbH、Borealis AG、TÜV 奥地利、Frauscher Sensonic、TRUMPF、NVIDIA 公司和 Atlas 的支持。
We extend our appreciation to Andreas Windisch and Clemens Wasner of AI Austria for their unwavering support. Their valuable feedback, connections, and facilitation of introductions within their expansive network have been instrumental to the progress of ExtensityAI.
我们对 AI Austria 的 Andreas Windisch 和 Clemens Wasner 表示衷心感谢,感谢他们的坚定支持。他们宝贵的反馈、联系以及在其广泛网络中促进的介绍对 ExtensityAI 的进展起到了重要作用。
Our gratitude also goes to Sergei Pereverzyev, whose enlightened guidance and thoughtful ideas have been a beacon for our research endeavors. Our thanks are equally extended to Gary Marcus, whose stimulating discussions sparked numerous innovative ideas incorporated into our framework.
我们同样感谢 Sergei Pereverzyev,他的开明指导和深思熟虑的想法为我们的研究工作指明了方向。我们也要感谢 Gary Marcus,他的激励性讨论激发了许多创新想法,这些想法被纳入了我们的框架中。
We are equally grateful to Markus Hofmarcher, a friend and colleague whose informed counsel and stimulating discussions have significantly sharpened various facets of our study. Additionally, our thanks are due to Fabian Paischer and Kajetan Schweighofer, whose preliminary work and assistance have been of enormous benefit.
我们同样感谢马克斯·霍夫马赫(Markus Hofmarcher),一位朋友和同事,他的深思熟虑的建议和激发思考的讨论显著提升了我们研究的各个方面。此外,我们还要感谢法比安·派舍(Fabian Paischer)和卡耶坦·施韦戈费(Kajetan Schweighofer),他们的初步工作和帮助对我们有着巨大的益处。
We are also grateful to our friends John Chong Min Tan and Tim Scarfe, whose communities have been a hub for exhilarating discussions. Their online presence and engagement have enriched the AI research landscape and broadened our perspectives.
我们还要感谢我们的朋友约翰·钟敏·谭(John Chong Min Tan)和蒂姆·斯卡夫(Tim Scarfe),他们的社区成为了激动人心讨论的中心。他们的在线存在和参与丰富了人工智能研究的领域,拓宽了我们的视野。
Moreover, we wish to honor the memories of the cherished family members we lost in 2023. Their influence in our lives extended beyond personal bonds, and the principles they instilled in us continue to shape our journey. It is with great respect and affection that we acknowledge the indelible impact they have made, enabling us to persist in our scientific pursuits with determination and integrity.
此外,我们希望缅怀在2023年失去的亲人。他们在我们生活中的影响超越了个人纽带,他们在我们心中灌输的原则继续塑造着我们的旅程。我们怀着极大的尊重和深情,承认他们所留下的不可磨灭的影响,使我们能够以决心和诚信继续我们的科学追求。

REFERENCES
参考文献

M. Z. Alom, T. M. Taha, C. Yakopcic, S. Westberg, P. Sidike, M. S. Nasrin, B. C. Van Esesn, A. A. S. Awwal, and V. K. Asari. The history began from alexnet: A comprehensive survey on deep learning approaches. arXiv preprint arXiv:1803.01164, 2018.
R. E. Amaro, J.-Y. Chen, J. M. Duarte, T. E. Hutton, C. Irving, M. C. Kandes, A. Majumdar, D. Y. Mishin, M. H. Nguyen, P. Rodriguez, F. Silva, R. S. Sinkovits, S. M. Strande, M. Tatineni, L. S. Tran, and N. Wolter. Voyager - an innovative computational resource for artificial intelligence & machine learning applications in science and engineering. In Practice and Experience in Advanced Research Computing, PEARC '23', pp. 278-282, New York, NY, USA, 2023. Association for Computing Machinery. ISBN 9781450399852. doi: 10.1145/3569951.3597597.
Peter Anderson, Basura Fernando, Mark Johnson, and Stephen Gould. SPICE: semantic propositional image caption evaluation. CoRR, abs/1607.08822, 2016. URL http://arxiv.org/abs/1607.08822.
J. Andreas. Language models as agent models. CoRR, abs/2212.01681, 2022. doi: 10.48550/arXiv.2212.01681.
J. A. Arjona-Medina, M. Gillhofer, M. Widrich, T. Unterthiner, J. Brandstetter, and S. Hochreiter. RUDDER: return decomposition for delayed rewards. In Advances in Neural Information Processing Systems 32, pp. 13566-13577, 2019.
M. Assran, Q. Duval, I. Misra, P. Bojanowski, P. Vincent, M. Rabbat, Y. LeCun, and N. Ballas. Self-supervised learning from images with a joint-embedding predictive architecture. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 15619-15629, 2023.
L. M. Augusto. Computational Logic. Vol. 1: Classical Deductive Computing with Classical Logic. College Publications, London, 2 edition, 2022.
F. Badita. 1337 Use Cases for ChatGPT & other Chatbots in the AI-Driven Era. Google Docs, 2022.
D.M. Beazley. Python Essential Reference. Developer's library : essential references for programming professionals. Addison-Wesley, 2009. ISBN 9780672329784. URL https://books.google.ro/books?id= Chr1NDlUcI8C.
M. Beck, K. Pöppel, M. Spanring, A. Auer, O. Prudnikova, M. Kopp, G. Klambauer, J. Brandstetter, and S. Hochreiter. xlstm: Extended long short-term memory, 2024.
E. Bengio, M. Jain, M. Korablyov, D. Precup, and Y. Bengio. Flow network based generative models for non-iterative diverse candidate generation. Advances in Neural Information Processing Systems, 34:27381-27394, 2021a.
Y. Bengio, T. Deleu, E. J. Hu, S. Lahlou, M. Tiwari, and E. Bengio. Gflownet foundations. arXiv preprint arXiv:2111.09266, 2021b.
T. R. Besold, A. d. Garcez, S. Bader, H. Bowman, P. Domingos, P. Hitzler, K.-U. Kuehnberger, L. C. Lamb, D. Lowd, P. M. V. Lima, L. de Penning, G. Pinkas, H. Poon, and G. Zaverucha. Neural-symbolic learning and reasoning: A survey and interpretation, 2017.
M. Besta, N. Blach, A. Kubicek, R. Gerstenberger, L. Gianinazzi, J. Gajda, T. Lehmann, M. Podstawski, H. Niewiadomski, P. Nyczyk, and T. Hoefler. Graph of thoughts: Solving elaborate problems with large language models. arXiv preprint arXiv:2308.09687, 2023.
S. Biderman, H. Schoelkopf, Q. Anthony, H. Bradley, K. O'Brien, E. Hallahan, M. Aflah Khan, S. Purohit, S. Prashanth, E. Raff, A. Skowron, L. Sutawika, and O. van der Wal. Pythia: A suite for analyzing large language models across training and scaling, 2023.
PENG Bo. Blinkdl/rwkv-lm: 0.01. Technical report, Zenodo, August 2021. URL https://doi.org/10.5281/ zenodo. 5196577.
W. Bradley Knox and Peter Stone. TAMER: Training an Agent Manually via Evaluative Reinforcement. In 2008 7th IEEE International Conference on Development and Learning, pp. 292-297, Monterey, CA, August 2008. IEEE. ISBN 978-1-4244-2661-4. doi: 10.1109/DEVLRN.2008.4640845.
J. D. Bransford and M. K. Johnson. Contextual prerequisites for understanding: Some investigations of comprehension and recall. Journal of Verbal Learning and Verbal Behavior, 11(6):717-726, 1972. ISSN 0022-5371.
T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, S. Agarwal, A. Herbert-Voss, G. Krueger, T. Henighan, R. Child, A. Ramesh, D. Ziegler, J. Wu, C. Winter, C. Hesse, M. Chen, E. Sigler, M. Litwin, S. Gray, B. Chess, J. Clark, C. Berner, S. McCandlish, A. Radford, I. Sutskever, and D. Amodei. Language models are few-shot learners. In H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin (eds.), Advances in Neural Information Processing Systems, volume 33, pp. 1877-1901. Curran Associates, Inc., 2020.
T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, S. Agarwal, A. Herbert-Voss, G. Krueger, T. Henighan, R. Child, A. Ramesh, D. Ziegler, J. Wu, C. Winter, C. Hesse, M. Chen, E. Sigler, M. Litwin, S. Gray, B. Chess, J. Clark, C. Berner, S. McCandlish, A. Radford, I. Sutskever, 和 D. Amodei。语言模型是少量学习者。在 H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, 和 H. Lin (编),《神经信息处理系统进展》,第 33 卷,页 1877-1901。Curran Associates, Inc.,2020。
H. Chase. LangChain. Technical report, LangChain, 01 2023. URL https://github.com/hwchase17/ langchain.
H. Chase。LangChain。技术报告,LangChain,2023年1月。网址 https://github.com/hwchase17/langchain。
W. Chen, M. Kifer, and D. S. Warren. Hilog: A foundation for higher-order logic programming. The Journal of Logic Programming, 15(3):187-230, 1993. ISSN 0743-1066.
W. Chen, M. Kifer, 和 D. S. Warren。Hilog:高阶逻辑编程的基础。《逻辑编程杂志》,15(3):187-230,1993年。ISSN 0743-1066。
F. Chollet. On the measure of intelligence, 2019.
F. Chollet。关于智能的度量,2019年。
N. Chomsky. Three models for the description of language. IRE Transactions on Information Theory, 2(3):113-124, 1956. doi: 10.1109/TIT.1956.1056813.
N. Chomsky。语言描述的三种模型。《信息理论学报》,2(3):113-124,1956年。doi: 10.1109/TIT.1956.1056813。
P. F. Christiano, J. Leike, T. Brown, M. Martic, S. Legg, and D. Amodei. Deep reinforcement learning from human preferences. Advances in neural information processing systems, 30, 2017.
P. F. Christiano, J. Leike, T. Brown, M. Martic, S. Legg 和 D. Amodei. 从人类偏好中进行深度强化学习. 神经信息处理系统进展, 30, 2017.
A. Colmerauer and P. Roussel. The birth of Prolog. In HOPL-II, 1993.
A. Colmerauer 和 P. Roussel. Prolog 的诞生. 在 HOPL-II, 1993.
A. d'Avila Garcez and L. C. Lamb. Neurosymbolic ai: The 3rd wave. arXiv preprint arXiv:2012.05876, 2020.
A. d'Avila Garcez 和 L. C. Lamb. 神经符号人工智能:第三波. arXiv 预印本 arXiv:2012.05876, 2020.
A. d'Avila Garcez, M. Gori, L. C. Lamb, L. Serafini, M. Spranger, and S. N. Tran. Neural-symbolic computing: An effective methodology for principled integration of machine learning and reasoning. Journal of Applied Logic,
A. d'Avila Garcez, M. Gori, L. C. Lamb, L. Serafini, M. Spranger 和 S. N. Tran. 神经符号计算:一种有效的方法论,用于机器学习与推理的原则性整合. 应用逻辑杂志,
2019.
A. Dawid and Y. LeCun. Introduction to latent variable energy-based models: A path towards autonomous machine intelligence. arXiv preprint arXiv:2306.02572, 2023.
A. Dawid 和 Y. LeCun. 潜变量基于能量的模型简介:通向自主机器智能的路径. arXiv 预印本 arXiv:2306.02572, 2023.
J. Degrave. Building A Virtual Machine inside ChatGPT. Technical report, Engraved, 11 2022. URL https: //www.engraved.blog/building-a-virtual-machine-inside/.
J. Degrave. 在 ChatGPT 中构建虚拟机. 技术报告, Engraved, 2022年11月. URL https://www.engraved.blog/building-a-virtual-machine-inside/.
F. Deniz, A. O. Nunez-Elizalde, A. G. Huth, and J. L. Gallant. The representation of semantic information across human cerebral cortex during listening versus reading is invariant to stimulus modality. Journal of Neuroscience, 39(39):7722-7736, 2019. ISSN 0270-6474. doi: 10.1523/JNEUROSCI.0675-19.2019.
F. Deniz, A. O. Nunez-Elizalde, A. G. Huth 和 J. L. Gallant. 在听与读的过程中,人类大脑皮层对语义信息的表征对刺激模态是不变的. 神经科学杂志, 39(39):7722-7736, 2019. ISSN 0270-6474. doi: 10.1523/JNEUROSCI.0675-19.2019.
D. C. Dennett. Real patterns. Journal of Philosophy, 88(1):27-51, 1991. doi: 10.2307/2027085.
D. C. Dennett. 真实模式. 哲学杂志, 88(1):27-51, 1991. doi: 10.2307/2027085.
M. Dilhara, A. Ketkar, and D. Dig. Understanding software-2.0: A study of machine learning library usage and evolution. ACM Transactions on Software Engineering and Methodology (TOSEM), 30(4):55:1-55:42, jul 2021. ISSN 1049-331X. doi: 10.1145/3453478.
M. Dilhara, A. Ketkar, 和 D. Dig. 理解软件2.0:机器学习库使用与演变的研究. ACM 软件工程与方法学汇刊 (TOSEM), 30(4):55:1-55:42, 2021年7月. ISSN 1049-331X. doi: 10.1145/3453478.
M.-C. Dinu, M. Hofmarcher, V. P. Patil, M. Dorfer, P. M. Blies, J. Brandstetter, J. A. Arjona-Medina, and S. Hochreiter. Xai and strategy extraction via reward redistribution. In A. Holzinger, R. Goebel, R. Fong, T. Moon, K.-R. Müller, and W. Samek (eds.), xxAI - Beyond Explainable AI: International Workshop, Held in Conjunction with ICML 2020, July 18, 2020, Vienna, Austria, Revised and Extended Papers, pp. 177-205, Cham, 2022. Springer International Publishing. ISBN 978-3-031-04083-2. doi: 10.1007/978-3-031-04083-2_10.
M.-C. Dinu, M. Hofmarcher, V. P. Patil, M. Dorfer, P. M. Blies, J. Brandstetter, J. A. Arjona-Medina, 和 S. Hochreiter. 通过奖励重分配进行可解释人工智能和策略提取. 在 A. Holzinger, R. Goebel, R. Fong, T. Moon, K.-R. Müller, 和 W. Samek (编), xxAI - 超越可解释人工智能:国际研讨会, 与 ICML 2020 同期举行, 2020年7月18日, 奥地利维也纳, 修订和扩展论文, 第177-205页, Cham, 2022. 施普林格国际出版. ISBN 978-3-031-04083-2. doi: 10.1007/978-3-031-04083-2_10.
M.-C. Dinu, M. Holzleitner, M. Beck, H. D. Nguyen, A. Huber, H. Eghbal-zadeh, B. A. Moser, S. V. Pereverzyev, S. Hochreiter, and W. Zellinger. Addressing parameter choice issues in unsupervised domain adaptation by aggregation. In The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net, 2023.
M.-C. Dinu, M. Holzleitner, M. Beck, H. D. Nguyen, A. Huber, H. Eghbal-zadeh, B. A. Moser, S. V. Pereverzyev, S. Hochreiter, 和 W. Zellinger. 通过聚合解决无监督领域适应中的参数选择问题。在第十一届国际学习表征会议,ICLR 2023,卢旺达基加利,2023年5月1日至5日。OpenReview.net,2023年。
I. Donadello, L. Serafini, and A. d'Avila Garcez. Logic tensor networks for semantic image interpretation. In Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, IJCAI-17, pp. 1596-1602,
I. Donadello, L. Serafini, 和 A. d'Avila Garcez. 用于语义图像解释的逻辑张量网络。在第二十六届国际人工智能联合会议论文集,IJCAI-17,第1596-1602页,
2017.
D. C. Dowson and B. V. Landau. The fréchet distance between multivariate normal distributions. Journal of Multivariate Analysis, 12(3):450-455, 1982. doi: https://doi.org/10.1016/0047-259X(82)90077-X.
D. C. Dowson 和 B. V. Landau. 多元正态分布之间的弗雷歇距离。《多元分析杂志》,12(3):450-455,1982年。doi: https://doi.org/10.1016/0047-259X(82)90077-X。
Kevin Ellis. Human-like few-shot learning via bayesian reasoning over natural language. arXiv preprint arXiv:2306.02797, 2023.
Kevin Ellis. 通过对自然语言的贝叶斯推理实现类人少样本学习。arXiv 预印本 arXiv:2306.02797,2023年。
M. Fang, S. Deng, Y. Zhang, Z. Shi, L. Chen, M. Pechenizkiy, and J. Wang. Large language models are neurosymbolic reasoners. arXiv preprint arXiv:2401.09334, 2024.
E. Fedorenko, P.-J. Hsieh, A. Nieto-Castanon, S. Whitfield-Gabrieli, and N. Kanwisher. New method for fMRI investigations of language: Defining rois functionally in individual subjects. Journal of neurophysiology, 104:1177-94, 082010. doi: 10.1152/jn.00032.2010.
E. Feigenbaum, B. G. Buchanan, J. Lederberg, Carl Djerassi, and et al. Dendral, 1965.
C. Fernando, D. Banarse, H. Michalewski, S. Osindero, and T. Rocktäschel. Promptbreeder: Self-referential self-improvement via prompt evolution. arXiv preprint arXiv:2309.16797, 2023.
R. F. Gamble, G.-C. Roman, H. C. Cunningham, and W. E. Ball. Applying formal verification methods to rule-based programs. Int. J. Expert Syst., 7(3):203-237, sep 1994. ISSN 0894-9077.
S. Ganguly and V. Pudi. Paper2vec: Combining graph and text information for scientific paper representation. In Joemon Jose et al. (eds.), Advances in Information Retrieval, volume 10193 of Lecture Notes in Computer Science. Springer, Cham, 2017. ISBN 978-3-319-56607-8. doi: 10.1007/978-3-319-56608-5_30.
Y. Gao, Y. Xiong, X. Gao, K. Jia, J. Pan, Y. Bi, Y. Dai, J. Sun, and H. Wang. Retrieval-augmented generation for large language models: A survey. arXiv preprint arXiv:2312.10997, 2023.
A. Garcez, T. Besold, L. De Raedt, P. Földiák, P. Hitzler, T. Icard, K. Kühnberger, L. Lamb, R. Miikkulainen, and D. Silver. Neural-symbolic learning and reasoning: Contributions and challenges. In AAAI Conference, 2015.
M. Gauch, M. Beck, T. Adler, D. Kotsur, S. Fiel, H. Eghbal-zadeh, J. Brandstetter, J. Kofler, M. Holzleitner, dient Space. arXiv preprint arXiv:2206.03483, 2022.
X. Geng, A. Gudibande, H. Liu, E. Wallace, P. Abbeel, S. Levine, and D. Song. Koala: A dialogue model for academic research. Blog post, April 2023. URL https://bair.berkeley.edu/blog/2023/04/03/koala/.
Google. Gemini: A family of highly capable multimodal models. arXiv preprint arXiv:2312.11805, 2023.
A. Goyal, A. Friesen, A. Banino, T. Weber, N. R. Ke, A. P. Badia, A. Guez, M. Mirza, P. C. Humphreys, K. Konyushova, M. Valko, S. Osindero, T. Lillicrap, N. Heess, and C. Blundell. Retrieval-augmented reinforcement learning. In K. Chaudhuri, S. Jegelka, L. Song, C. Szepesvari, G. Niu, and S. Sabato (eds.), Proceedings of the 39th International Conference on Machine Learning, volume 162 of Proceedings of Machine Learning Research, pp. 7740-7765. PMLR, 17-23 Jul 2022.
A. Gretton, K. M. Borgwardt, M. J. Rasch, B. Schölkopf, and A. Smola. A kernel two-sample test. Journal of Machine Learning Research, 13(25):723-773, 2012.
A. Gu and T. Dao. Mamba: Linear-time sequence modeling with selective state spaces. arXiv preprint arXiv:2312.00752, 2023.
Q. Guo, Z. Jin, X. Qiu, W. Zhang, D. Wipf, and Z. Zhang. CycleGT: Unsupervised graph-to-text and text-to-graph generation via cycle training. arXiv preprint arXiv:2006.04702, 2020.
K. Hamilton, A. Nayak, B. Bož ić, and L. Longo. Is neuro-symbolic AI meeting its promises in natural language processing? a structured review. Semantic Web, pp. 1-42, nov 2022. doi: 10.3233/sw-223228.
C. R. Harris, K. J. Millman, S. J. van der Walt, R. Gommers, P. Virtanen, D. Cournapeau, E. Wieser, J. Taylor, S. Berg, N. J. Smith, R. Kern, M. Picus, S. Hoyer, M. H. van Kerkwijk, M. Brett, A. Haldane, J. Fernández del Río, M. Wiebe, P. Peterson, P. Gérard-Marchant, K. Sheppard, T. Reddy, W. Weckesser, H. Abbasi, C. Gohlke, and T. E. Oliphant. Array programming with NumPy. Nature, 585(7825):357-362, 2020. doi: 10.1038/s41586-020-2649-2.
M. Heusel, H. Ramsauer, T. Unterthiner, B. Nessler, and S. Hochreiter. Gans trained by a two time-scale update mation Processing Systems, NIPS'17, pp. 6629-6640, Red Hook, NY, USA, 2017. Curran Associates Inc. ISBN 9781510860964.
S. Hochreiter. Toward a broad AI. Commun. ACM, 65(4):56-57, mar 2022. ISSN 0001-0782.
S. Hochreiter and J. Schmidhuber. Flat minima. Neural Comput., 9(1):1-42, 1997.
M. Holzleitner, L. Gruber, J. A. Arjona-Medina, J. Brandstetter, and S. Hochreiter. Convergence proof for actor-critic methods applied to PPO and RUDDER. arXiv preprint arXiv:2012.01399, 2020.
M. Holzleitner, L. Gruber, J. A. Arjona-Medina, J. Brandstetter, 和 S. Hochreiter. 应用于PPO和RUDDER的演员-评论家方法的收敛性证明。arXiv 预印本 arXiv:2012.01399,2020年。
K. Hornik, M. Tinchcombe, and H. White. Multilayer feedforward networks are universal approximators. Neural Networks, 2:359-366, 1989. doi: 10.1016/0893-6080(89)90020-8.
K. Hornik, M. Tinchcombe, 和 H. White. 多层前馈网络是普遍逼近器. 神经网络, 2:359-366, 1989. doi: 10.1016/0893-6080(89)90020-8.
J. Hu, H. Small, H. Kean, A. Takahashi, L. Zekelman, D. Kleinman, E. Ryan, A. Nieto-Castañón, V. Ferreira, and E. Fedorenko. Precision fMRI reveals that the language-selective network supports both phrase-structure building and lexical access during language production. bioRxiv, 2022. doi: 10.1101/2021.09.10.459596.
J. Hu, H. Small, H. Kean, A. Takahashi, L. Zekelman, D. Kleinman, E. Ryan, A. Nieto-Castañón, V. Ferreira, 和 E. Fedorenko. 精确功能性磁共振成像揭示语言选择网络在语言生成过程中支持短语结构构建和词汇访问. bioRxiv, 2022. doi: 10.1101/2021.09.10.459596.
Z. Hu, X. Ma, Z. Liu, E. Hovy, and E. Xing. Harnessing deep neural networks with logic rules. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 2410-2420, Berlin, Germany, August 2016. Association for Computational Linguistics.
Z. Hu, X. Ma, Z. Liu, E. Hovy, 和 E. Xing. 利用逻辑规则驾驭深度神经网络. 在第54届计算语言学协会年会论文集(第1卷:长篇论文)中, 第2410-2420页, 德国柏林, 2016年8月. 计算语言学协会.
A. G. Huth, W. A. de Heer, T. L. Griffiths, F. E. Theunissen, and J. L. Gallant. Natural speech reveals the semantic maps that tile human cerebral cortex. Nature, 532(7600):453-458, 2016. doi: 10.1038/nature17637.
A. G. Huth, W. A. de Heer, T. L. Griffiths, F. E. Theunissen, 和 J. L. Gallant. 自然语言揭示了铺展人类大脑皮层的语义图. 自然, 532(7600):453-458, 2016. doi: 10.1038/nature17637.
B. Ibarz, J. Leike, T. Pohlen, G. Irving, S. Legg, and D. Amodei. Reward learning from human preferences and demonstrations in atari. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett (eds.), Advances in Neural Information Processing Systems, volume 31. Curran Associates, Inc., 2018
B. Ibarz, J. Leike, T. Pohlen, G. Irving, S. Legg, 和 D. Amodei. 从人类偏好和演示中学习奖励,在Atari中。收录于 S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, 和 R. Garnett (编辑),《神经信息处理系统进展》,第31卷。Curran Associates, Inc., 2018
B. Ibarz, V. Kurin, G. Papamakarios, K. Nikiforou, M. Abbana Bennani, R. Csordás, A. Dudzik, M. Bošnjak, A. Vitvit-skyi, Y. Rubanova, A. Deac, B. Bevilacqua, Y. Ganin, C. Blundell, and P. Velivčković. A generalist neural algorithmic learner. In LOG IN, 2022.
B. Ibarz, V. Kurin, G. Papamakarios, K. Nikiforou, M. Abbana Bennani, R. Csordás, A. Dudzik, M. Bošnjak, A. Vitvit-skyi, Y. Rubanova, A. Deac, B. Bevilacqua, Y. Ganin, C. Blundell, 和 P. Velivčković. 一种通用神经算法学习器。收录于 LOG IN, 2022.
Wolfram Research, Inc. Mathematica, Version 13.2, 2022. URL https : //www. wolfram. com/mathematica. Champaign, IL.
Wolfram Research, Inc. Mathematica, 版本 13.2, 2022. URL https : //www. wolfram. com/mathematica. Champaign, IL.
G. Indiveri, B. Linares-Barranco, T. Hamilton, A. van Schaik, R. Etienne-Cummings, T. Delbruck, S. Liu, P. Dudek, P. HÀfliger, S. Renaud, J. Schemmel, G. Cauwenberghs, J. Arthur, K. Hynna, F. Folowosele, S. SAà GHI, T. Serrano-Gotarredona, J. Wijekoon, Y. Wang, and K. Boahen. Neuromorphic silicon neuron circuits. Frontiers in Neuroscience, 5, 2011. ISSN 1662-453X. doi: 10.3389/fnins.2011.00073.
G. Indiveri, B. Linares-Barranco, T. Hamilton, A. van Schaik, R. Etienne-Cummings, T. Delbruck, S. Liu, P. Dudek, P. HÀfliger, S. Renaud, J. Schemmel, G. Cauwenberghs, J. Arthur, K. Hynna, F. Folowosele, S. SAà GHI, T. Serrano-Gotarredona, J. Wijekoon, Y. Wang, 和 K. Boahen. 神经形态硅神经元电路. 神经科学前沿, 5, 2011. ISSN 1662-453X. doi: 10.3389/fnins.2011.00073.
N. Jain, S. Vaidyanath, A. Iyer, N. Natarajan, S. Parthasarathy, S. Rajamani, and R. Sharma. Jigsaw: Large language models meet program synthesis. arXiv, 2021.
N. Jain, S. Vaidyanath, A. Iyer, N. Natarajan, S. Parthasarathy, S. Rajamani, 和 R. Sharma. Jigsaw: 大型语言模型与程序合成的结合. arXiv, 2021.
A. Q. Jiang, A. Sablayrolles, A. Mensch, C. Bamford, D. S. Chaplot, D. de las Casas, F. Bressand, G. Lengyel, G. Lample, L. Saulnier, L. R. Lavaud, M.-A. Lachaux, P. Stock, T. Le Scao, T. Lavril, T. Wang, T. Lacroix, and W. El Sayed. Mistral 7b. arXiv preprint arXiv:2310.06825, 2023.
A. Q. Jiang, A. Sablayrolles, A. Mensch, C. Bamford, D. S. Chaplot, D. de las Casas, F. Bressand, G. Lengyel, G. Lample, L. Saulnier, L. R. Lavaud, M.-A. Lachaux, P. Stock, T. Le Scao, T. Lavril, T. Wang, T. Lacroix, 和 W. El Sayed. Mistral 7b. arXiv 预印本 arXiv:2310.06825, 2023.
J. Johnson, M. Douze, and H. Jégou. Billion-scale similarity search with GPUs. IEEE Transactions on Big Data, 7 (3):535-547, 2019.
J. Johnson, M. Douze, 和 H. Jégou. 使用 GPU 进行十亿规模的相似性搜索. IEEE 大数据交易, 7 (3):535-547, 2019.
E. Jones and J. Steinhardt. Capturing failures of large language models via human cognitive biases. arXiv preprint arXiv:2202.12299, 2022.
E. Jones 和 J. Steinhardt. 通过人类认知偏差捕捉大型语言模型的失败. arXiv 预印本 arXiv:2202.12299, 2022.
A. Karpathy. Software 2.0. Medium, 2017. URL https://karpathy.medium.com/ software-2-0-a64152b37c35.
A. Karpathy. 软件 2.0. Medium, 2017. URL https://karpathy.medium.com/software-2-0-a64152b37c35.
N. Kassner, B. Krojer, and H. Schütze. Are Pretrained Language Models Symbolic Reasoners over Knowledge? In R. Fernández and T. Linzen (eds.), Proceedings of the 24th Conference on Computational Natural Language Learning, CoNLL 2020, Online, November 19-20, 2020, pp. 552-564. Association for Computational Linguistics, 2020. doi: 10.18653/v1/2020.conll-1.45.
N. Kassner, B. Krojer, 和 H. Schütze. 预训练语言模型是否是知识的符号推理者? 在 R. Fernández 和 T. Linzen (编辑), 第24届计算自然语言学习会议论文集, CoNLL 2020, 在线, 2020年11月19-20日, 第552-564页. 计算语言学协会, 2020. doi: 10.18653/v1/2020.conll-1.45.
T. Katsch. Gateloop: Fully data-controlled linear recurrence for sequence modeling. arXiv preprint arXiv:2311.01927,
T. Katsch. Gateloop: 完全数据控制的线性递归用于序列建模. arXiv 预印本 arXiv:2311.01927.
2023.
D. Key, W.-D. Li, and K. Ellis. Toward trustworthy neural program synthesis. arXiv preprint arXiv:2210.00848,2023.
D. Key, W.-D. Li, 和 K. Ellis. 朝着可信赖的神经程序合成迈进. arXiv 预印本 arXiv:2210.00848, 2023.
G. Kim, P. Baldi, and S. McAleer. Language models can solve computer tasks, 2023.
G. Kim, P. Baldi, 和 S. McAleer. 语言模型可以解决计算机任务, 2023.
L. Kirsch and J. Schmidhuber. Eliminating meta optimization through self-referential meta learning. arXiv preprint arXiv:2212.14392, 2022.
L. Kirsch 和 J. Schmidhuber. 通过自指元学习消除元优化. arXiv 预印本 arXiv:2212.14392, 2022.
A. Köpf, Y. Kilcher, D. von Rütte, S. Anagnostidis, Z.-R. Tam, K. Stevens, A. Barhoum, N. M. Duc, O. Stanley, R. Nagyfi, S. ES, S. Suri, D. Glushkov, A. Dantuluri, A. Maguire, C. Schuhmann, H. Nguyen, and A. Mattick. Openassistant conversations - democratizing large language model alignment, 2023.
A. Köpf, Y. Kilcher, D. von Rütte, S. Anagnostidis, Z.-R. Tam, K. Stevens, A. Barhoum, N. M. Duc, O. Stanley, R. Nagyfi, S. ES, S. Suri, D. Glushkov, A. Dantuluri, A. Maguire, C. Schuhmann, H. Nguyen, 和 A. Mattick. Openassistant 对话 - 使大型语言模型对齐的民主化, 2023.
D. C. Kuncicky, S. I. Hruska, and R. C. Lacher. Hybrid systems: the equivalence of rule-based expert system and artificial neural network inference. Int. J. Expert Syst., 4(3):281-297, jan 1991. ISSN 0894-9077.
D. C. Kuncicky, S. I. Hruska, 和 R. C. Lacher. 混合系统:基于规则的专家系统与人工神经网络推理的等价性. 国际专家系统期刊, 4(3):281-297, 1991年1月. ISSN 0894-9077.
E. Kıcıman, R. Ness, A. Sharma, and C. Tan. Causal Reasoning and Large Language Models: Opening a New Frontier for Causality. arXiv, 2023.
E. Kıcıman, R. Ness, A. Sharma, 和 C. Tan. 因果推理与大型语言模型:为因果关系开辟新前沿. arXiv, 2023.
S. Lahlou, T. Deleu, P. Lemos, D. Zhang, A. Volokhova, A. Hernández-García, L. N. Ezzine, Y. Bengio, and N. Malkin. A theory of continuous generative flow networks. In Proceedings of the International Conference on Machine Learning, pp. 18269-18300. PMLR, 2023.
S. Lahlou, T. Deleu, P. Lemos, D. Zhang, A. Volokhova, A. Hernández-García, L. N. Ezzine, Y. Bengio, 和 N. Malkin. 连续生成流网络理论. 在国际机器学习会议论文集, 第18269-18300页. PMLR, 2023.
J. E. Laird. Introduction to soar, 2022.
J. E. Laird. Soar简介, 2022.
J. E. Laird, A. Newell, and P. S. Rosenbloom. Soar: An architecture for general intelligence. Artificial Intelligence, 33(1):1-64, 1987. ISSN 0004-3702.
J. E. Laird, A. Newell, 和 P. S. Rosenbloom. Soar:一种通用智能架构. 人工智能, 33(1):1-64, 1987. ISSN 0004-3702.
L. C. Lamb, A. Garcez, M. Gori, M. Prates, P. Avelar, and M. Vardi. Graph neural networks meet neural-symbolic computing: A survey and perspective. In AAAI Conference, 2020.
L. C. Lamb, A. Garcez, M. Gori, M. Prates, P. Avelar, 和 M. Vardi. 图神经网络与神经符号计算的结合:一项调查与展望. 在 AAAI 会议, 2020.
P. Langley, J. Laird, and S. Rogers. Cognitive architectures: Research issues and challenges. Cognitive Systems Research, 10:141-160, 2009. doi: 10.1016/j.cogsys.2006.07.004.
P. Langley, J. Laird, 和 S. Rogers. 认知架构:研究问题与挑战. 认知系统研究, 10:141-160, 2009. doi: 10.1016/j.cogsys.2006.07.004.
Y. LeCun. A path towards autonomous machine intelligence, 2022. OpenReview Archive.
Y. LeCun. 通向自主机器智能的路径, 2022. OpenReview 存档.
H. Li, Y. Su, D. Cai, Y. Wang, and L. Liu. A survey on retrieval-augmented text generation. arXiv preprint arXiv:2202.01110, 2022a.
H. Li, Y. Su, D. Cai, Y. Wang, 和 L. Liu. 关于检索增强文本生成的调查. arXiv 预印本 arXiv:2202.01110, 2022a.
Y. Li, D. Choi, J. Chung, N. Kushman, J. Schrittwieser, R. Leblond, T. Eccles, J. Keeling, F. Gimeno, A. Dal Lago, et al. Competition-level code generation with alphacode. Science, 378(6624):1092-1097, 2022b.
Y. Li, D. Choi, J. Chung, N. Kushman, J. Schrittwieser, R. Leblond, T. Eccles, J. Keeling, F. Gimeno, A. Dal Lago, 等. 竞争级代码生成与 alphacode. 科学, 378(6624):1092-1097, 2022b.
Z. Li, Z. Yang, and M. Wang. Reinforcement learning with human feedback: Learning dynamic choices via pessimism. arXiv preprint arxiv:2305.18438, 2023.
Z. Li, Z. Yang, 和 M. Wang. 带有人类反馈的强化学习:通过悲观主义学习动态选择. arXiv 预印本 arxiv:2305.18438, 2023.
N. F. Liu, K. Lin, J. Hewitt, A. Paranjape, M. Bevilacqua, F. Petroni, and P. Liang. Lost in the middle: How language models use long contexts. arXiv preprint arXiv:2307.03172, 2023.
N. F. Liu, K. Lin, J. Hewitt, A. Paranjape, M. Bevilacqua, F. Petroni, 和 P. Liang. 在中间迷失:语言模型如何使用长上下文. arXiv 预印本 arXiv:2307.03172, 2023.
S. Lu, D. Guo, S. Ren, J. Huang, A. Svyatkovskiy, A. Blanco, C. Clement, D. Drain, D. Jiang, D. Tang, et al. Codexglue: A machine learning benchmark dataset for code understanding and generation. arXiv preprint arXiv:2102.04664, 2021.
S. Lu, D. Guo, S. Ren, J. Huang, A. Svyatkovskiy, A. Blanco, C. Clement, D. Drain, D. Jiang, D. Tang, 等. Codexglue: 一种用于代码理解和生成的机器学习基准数据集. arXiv 预印本 arXiv:2102.04664, 2021.
M. Lutz. Learning Python: Powerful Object-Oriented Programming. Animal Guide. O'Reilly Media, 2013. ISBN 9781449355715.
M. Lutz. 学习 Python: 强大的面向对象编程. Animal Guide. O'Reilly Media, 2013. ISBN 9781449355715.
Q. Lyu, S. Havaldar, A. Stein, L. Zhang, D. Rao, E. Wong, M. Apidianaki, and C. Callison-Burch. Faithful chain-of-thought reasoning, 2023.
Q. Lyu, S. Havaldar, A. Stein, L. Zhang, D. Rao, E. Wong, M. Apidianaki, 和 C. Callison-Burch. 忠实的思维链推理, 2023.
J. MacGlashan, M. K. Ho, R. Loftin, B. Peng, G. Wang, D. L. Roberts, M. E. Taylor, and M. L. Littman. Interactive Learning from Policy-Dependent Human Feedback. In Proceedings of the 34th International Conference on Machine Learning, pp. 2285-2294. PMLR, July 2017.
J. MacGlashan, M. K. Ho, R. Loftin, B. Peng, G. Wang, D. L. Roberts, M. E. Taylor, 和 M. L. Littman. 基于政策依赖的人类反馈的交互学习. 在第34届国际机器学习会议论文集中, 第2285-2294页. PMLR, 2017年7月.
M. Macsweeney. Neural systems underlying british sign language and audio-visual english processing in native users. Brain, 125:1583-1593, 07 2002. doi: 10.1093/brain/awf153.
M. Macsweeney. 英国手语和母语用户的视听英语处理背后的神经系统. 大脑, 125:1583-1593, 2002年7月. doi: 10.1093/brain/awf153.
A. Madaan, N. Tandon, P. Gupta, S. Hallinan, L. Gao, S. Wiegreffe, U. Alon, N. Dziri, S. Prabhumoye, Y. Yang, S. Welleck, B. P. Majumder, S. Gupta, A. Yazdanbakhsh, and P. Clark. Self-refine: Iterative refinement with self-feedback, 2023.
A. Madaan, N. Tandon, P. Gupta, S. Hallinan, L. Gao, S. Wiegreffe, U. Alon, N. Dziri, S. Prabhumoye, Y. Yang, S. Welleck, B. P. Majumder, S. Gupta, A. Yazdanbakhsh, 和 P. Clark. 自我精炼:带自我反馈的迭代精炼,2023年。
K. Mahowald, A. A. Ivanova, I. A. Blank, N. Kanwisher, J. B. Tenenbaum, and E. Fedorenko. Dissociating language and thought in large language models: a cognitive perspective. CoRR, abs/2301.06627, 2023.
K. Mahowald, A. A. Ivanova, I. A. Blank, N. Kanwisher, J. B. Tenenbaum, 和 E. Fedorenko. 从认知角度区分大型语言模型中的语言与思维。CoRR, abs/2301.06627, 2023年。

R. Manhaeve, S. Dumancic, A. Kimmig, T. Demeester, and L. De Raedt. DeepProbLog: Neural Probabilistic Logic
Programming. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett (eds.),
Advances in Neural Information Processing Systems, volume 31. Curran Associates, Inc., 2018.
J. Mao, C. Gan, P. Kohli, J. B. Tenenbaum, and J. Wu. The neuro-symbolic concept learner: Interpreting scenes,
words, and sentences from natural supervision. In 7th International Conference on Learning Representations, ICLR
2019, 2019.
G. Marcus. The Next Decade in AI: Four Steps Towards Robust Artificial Intelligence. arXiv preprint
arXiv:2002.06177, 2020.
A. Martelli, A. Ravenscroft, and D. Ascher. Python Cookbook. O'Reilly Media, 2005. ISBN 9780596554743. URL
https://books.google.ro/books?id=Q0s6Vgb98CQC.
J. McCarthy. Lisp: A programming system for symbolic manipulations. In Preprints of Papers Presented at the 14th
National Meeting of the Association for Computing Machinery, ACM '59, pp. 1-4, New York, NY, USA, 1959.
Association for Computing Machinery. ISBN 9781450373647. doi: 10.1145/612201.612243.
J. McCarthy, M. L. Minsky, N. Rochester, and C. E. Shannon. A proposal for the dartmouth summer research project
on artificial intelligence, august 31, 1955. AI magazine, 27(4):12-12, 2006.
W. S. McCulloch and W. Pitts. A Logical Calculus of Ideas Immanent in Nervous Activity. Bulletin of Mathematical
Biophysics, 5:115-133, 1943. doi: 10.1007/BF02478255.
L. Menenti, S. M. E. Gierhan, K. Segaert, and P. Hagoort. Shared language: Overlap and segregation of the neuronal
infrastructure for speaking and listening revealed by functional mri. Psychological Science, 22(9):1173-1182, 2011.
doi: 10.1177/0956797611418347. PMID: 21841148.
Microsoft. Bing is your AI-powered copilot for the web. Technical report, Microsoft, 2023. URL https://bing.
com/chat.
T. Mikolov, K. Chen, G. Corrado, and J. Dean. Efficient estimation of word representations in vector space, 2013a.
Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Efficient estimation of word representations in vec-
tor space. CoRR, abs/1301.3781, 2013b. URL http://dblp.uni-trier.de/db/journals/corr/
corr1301.html#abs-1301-3781.
S. Milani, N. Topin, B. Houghton, W. H. Guss, S. P. Mohanty, K. Nakata, O. Vinyals, and N. S. Kuno. Retrospective
analysis of the 2019 minerl competition on sample efficient reinforcement learning. In H. J. Escalante and R. Hadsell
(eds.), Proceedings of the NeurIPS 2019 Competition and Demonstration Track, volume 123 of Proceedings of
Machine Learning Research, pp. 203-214. PMLR, Dec 2020.
L. De Moura and N. Bjørner. Z3: an efficient smt solver. In Proceedings of the Theory and Practice of Soft-
ware, 14th International Conference on Tools and Algorithms for the Construction and Analysis of Systems,
TACAS'08/ETAPS'08, pp. 337-340, Berlin, Heidelberg, 2008. Springer-Verlag. ISBN 3540787992.
A. Newell. Unified Theories of Cognition. Harvard University Press, USA, 1990. ISBN 0674920996.
A. Newell and H. Simon. The logic theory machine-a complex information processing system. IRE Transactions on
information theory, 2(3):61-79, 1956.
A. Newell and H. A. Simon. Human problem solving. Prentice-Hall, pp. 920, 1972.
A. Newell and H. A. Simon. Computer science as empirical inquiry: symbols and search. Commun. ACM, 19(3):
113-126, mar 1976. ISSN 0001-0782. doi: 10.1145/360018.360022.
A. Newell, J. C. Shaw, and H. A. Simon. Empirical explorations of the logic theory machine: a case study in heuristic.
IRE-AIEE-ACM '57 (Western): Papers presented at the February 26-28, 1957, western joint computer conference:
Techniques for reliability, pp. 218-230, 1957. doi: 10.1145/1455567.1455605.
H. Nori, Y. T. Lee, S. Zhang, D. Carignan, R. Edgar, N. Fusi, N. King, J. Larson, Y. Li, W. Liu, R. Luo, S. M.
McKinney, R. O. Ness, H. Poon, T. Qin, N. Usuyama, C. White, and E. Horvitz. Can generalist foundation models
outcompete special-purpose tuning? case study in medicine. arXiv preprint arXiv:2311.16452, 2023.


A. v. d. Oord, S. Dieleman, H. Zen, K. Simonyan, O. Vinyals, A. Graves, N. Kalchbrenner, A. Senior, and
K. Kavukcuoglu. Wavenet: A generative model for raw audio. arXiv preprint arXiv:1609.03499, 2016.
OpenAI. Introducing ChatGPT. Technical report, OpenAI, November 2022. URL https://openai.com/blog/
chatgpt.
OpenAI. GPT-4 Technical Report. arXiv, 2023.
L. Ouyang, J. Wu, X. Jiang, D. Almeida, C. L. Wainwright, P. Mishkin, C. Zhang, S. Agarwal, K. Slama, A. Ray,
J. Schulman, J. Hilton, F. Kelton, L. E. Miller, M. Simens, A. Askell, P. Welinder, P. F. Christiano, J. Leike, and R. J.
Lowe. Training language models to follow instructions with human feedback. arXiv preprint arXiv:2203.02155,
F. Paischer, T. Adler, V. Patil, A. Bitto-Nemling, M. Holzleitner, S. Lehner, H. Eghbal-Zadeh, and S. Hochreiter. His-
tory compression via language models in reinforcement learning. In K. Chaudhuri, S. Jegelka, L. Song, C. Szepes-
vari, G. Niu, and S. Sabato (eds.), Proceedings of the 39th International Conference on Machine Learning, volume
162 of Proceedings of Machine Learning Research, pp. 17156-17185. PMLR, July 2022.
F. Paischer, T. Adler, M. Hofmarcher, and S. Hochreiter. Semantic helm: An interpretable memory for reinforcement
learning. CoRR, abs/2306.09312, 2023. doi: 10.48550/arXiv.2306.09312.
K. Papineni, S. Roukos, T. Ward, and W.-J. Zhu. Bleu: a method for automatic evaluation of machine translation. In
Proceedings of the 40th annual meeting of the Association for Computational Linguistics, pp. 311-318, 2002.
N. Park, D. Chae, J. Shim, S. Kim, E.-S. Kim, and J. Kim. Bridging the domain gap by clustering-based image-text
graph matching. arXiv preprint arXiv:2310.02692, 2023.
A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga,
A. Desmaison, A. Köpf, E. Z. Yang, Z. DeVito, M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner, L. Fang,
J. Bai, and S. Chintala. Pytorch: An imperative style, high-performance deep learning library. arXiv preprint
arXiv:1912.01703, abs/1912.01703, 2019.
V. P. Patil, M. Hofmarcher, M.-C. Dinu, M. Dorfer, P. M. Blies, J. Brandstetter, J. A. Arjona-Medina, and S. Hochreiter.
Align-RUDDER: Learning from few demonstrations by reward redistribution. arXiv preprint arXiv:2009.14108,
2020.
F. Petroni, T. Rocktäschel, S. Riedel, P. S. H. Lewis, A. Bakhtin, Y. Wu, and A. H. Miller. Language Models as
Knowledge Bases? In K. Inui, J. Jiang, V. Ng, and X. Wan (eds.), Proceedings of the 2019 Conference on Em-
Processing, EMNLP-IJCNLP 2019, Hong Kong, China, November 3-7, 2019, pp. 2463-2473. Association for Com-
putational Linguistics, 2019. doi: 10.18653/v1/D19-1250.
S. Pitis, M. R. Zhang, A. Wang, and J. Ba. Boosted prompt ensembles for large language models, 2023.
M. Poli, S. Massaroli, E. Nguyen, D. Y. Fu, T. Dao, S. Baccus, Y. Bengio, S. Ermon, and C. Ré. Hyena hierarchy:
Towards larger convolutional language models. arXiv preprint arXiv:2302.10866, 2023.
B. Poole, A. Jain, J. T. Barron, and B. Mildenhall. Dreamfusion: Text-to-3d using 2d diffusion. arXiv preprint
arXiv:2209.14988, 2022.
M. Qu and J. Tang. Probabilistic logic neural networks for reasoning. In Proceedings of the 33rd International
Conference on Neural Information Processing Systems, 2019.
R. Rafailov, A. Sharma, E. Mitchell, S. Ermon, C. D. Manning, and C. Finn. Direct preference optimization: Your
language model is secretly a reward model. arXiv preprint arXiv:2305.18290, 2023.
A. Ramesh, M. Pavlov, G. Goh, S. Gray, C. Voss, A. Radford, M. Chen, and I. Sutskever. Zero-shot text-to-image
generation. In International Conference on Machine Learning, pp. 8821-8831. PMLR, 2021.
H. Ramsauer, B. Schäfl, J. Lehner, P. Seidl, M. Widrich, T. Adler, L. Gruber, M. Holzleitner, M. Pavlović, G. K.
Sandve, et al. Hopfield networks is all you need. arXiv preprint arXiv:2008.02217, 2020.
M. Regev, C. J. Honey, E. Simony, and U. Hasson. Selective and invariant neural responses to spoken and written
narratives. Journal of Neuroscience, 33(40):15978-15988, 2013. ISSN 0270-6474. doi: 10.1523/JNEUROSCI.
1580-13.2013.

ReplikaAI. Pushing the Boundaries of AI to Talk to the Dead. Technical report, Rep-likaAI, 2016. URL https://www.bloomberg.com/news/articles/2016-10-20/ pushing-the-boundaries-of-ai-to-talk-to-the-dead.
B. Romera-Paredes, M. Barekatain, A. Novikov, et al. Mathematical discoveries from program search with large language models. Nature, 2023. doi: 10.1038/s41586-023-06924-6.
F. Rosenblatt. The perceptron: A probabilistic model for information storage and organization in the brain. Psychological Review, 65(6):386-408, 1958. doi: 10.1037/h0042519.
L. Ruis, A. Khan, S. Biderman, S. Hooker, T. Rocktäschel, and E. Grefenstette. Large language models are not zero-shot communicators. CoRR, abs/2210.14986, 2022. doi: 10.48550/arXiv.2210.14986.
D. Rumelhart, G. Hinton, and R. Williams. Learning representations by back-propagating errors. Nature, 323:533- 536,1986. doi: 10.1038/323533a0.
C. Saharia, W. Chan, S. Saxena, L. Li, J. Whang, E. Denton, S. K. S. Ghasemipour, B. K. Ayan, S. S. Mahdavi, R. G. Lopes, et al. Photorealistic text-to-image diffusion models with deep language understanding. arXiv preprint arXiv:2205.11487, 2022.
A. Santoro, A. Lampinen, K. Mathewson, T. Lillicrap, and D. Raposo. Symbolic behaviour in artificial intelligence. arXiv preprint arXiv:2102.03406, 2022.
T. Schick, J. Dwivedi-Yu, R. Dessi, R. Raileanu, M. Lomeli, L. Zettlemoyer, N. Cancedda, and T. Scialom. Toolformer: Language models can teach themselves to use tools, 2023.
J. Schmidhuber. Gödel machines: Fully self-referential optimal universal self-improvers. Cognitive Technologies, 8: 199-226, 01 2007. doi: 10.1007/978-3-540-68677-4_7.
J. Schmidhuber. Driven by compression progress: A simple principle explains essential aspects of subjective beauty, novelty, surprise, interestingness, attention, curiosity, creativity, art, science, music, jokes. arXiv preprint arXiv:0812.4360, 2009.
J. Schrittwieser, I. Antonoglou, T. Hubert, K. Simonyan, L. Sifre, S. Schmitt, A. Guez, E. Lockhart, D. Hassabis, T. Graepel, T. Lillicrap, and D. Silver. Mastering atari, go, chess and shogi by planning with a learned model. Nature, 588(7839):604-609, 2020. doi: 10.1038/s41586-020-03051-4.
K. Schweighofer, A. Radler, M.-C. Dinu, M. Hofmarcher, V. P. Patil, A. Bitto-Nemling, H. Eghbal-zadeh, and S. Hochreiter. A dataset perspective on offline reinforcement learning. In Conference on Lifelong Learning Agents, pp. 470-517. PMLR, 2022.
T. Scott, J. Gallée, and E. Fedorenko. A new fun and robust version of an fMRI localizer for the frontotemporal language system. Cognitive Neuroscience, 8:1-10, 07 2016. doi: 10.1080/17588928.2016.1201466.
N. Shinn, B. Labash, and A. Gopinath. Reflexion: an autonomous agent with dynamic memory and self-reflection,
D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. van den Driessche, J. Schrittwieser, I. Antonoglou, V. Pan-neershelvam, M. Lanctot, S. Dieleman, D. Grewe, J. Nham, N. Kalchbrenner, I. Sutskever, T. P. Lillicrap, M. Leach, K. Kavukcuoglu, T. Graepel, and D. Hassabis. Mastering the game of Go with deep neural networks and tree search. Nature, 529(7587):484-489, 2016. doi: 10.1038/nature16961.
D. Silver, T. Hubert, J. Schrittwieser, I. Antonoglou, M. Lai, A. Guez, M. Lanctot, L. Sifre, D. Kumaran, T. Graepel, T. Lillicrap, K. Simonyan, and D. Hassabis. Mastering chess and shogi by self-play with a general reinforcement learning algorithm. arXiv preprint arXiv:1712.01815, 2017a.
D. Silver, J. Schrittwieser, K. Simonyan, et al. Mastering the game of go without human knowledge. Nature, 550:
U. Singer, A. Polyak, T. Hayes, X. Yin, J. An, S. Zhang, Q. Hu, H. Yang, O. Ashual, O. Gafni, et al. Make-a-video: Text-to-video generation without text-video data. arXiv preprint arXiv:2209.14792, 2022.
K. Singhal, S. Azizi, T. Tu, S. S. Mahdavi, J. Wei, H. W. Chung, N. Scales, et al. Large language models encode clinical knowledge. Nature, 620(7972):172-180, 2023.
K. Song, X. Tan, T. Qin, J. Lu, and T.-Y. Liu. Mpnet: Masked and permuted pre-training for language understanding. In Proceedings of the 34th International Conference on Neural Information Processing Systems, NIPS'20, pp. 1414, Red Hook, NY, USA, 2020. Curran Associates Inc. ISBN 9781713829546.
Spotify. Approximate Nearest Neighbors Oh Yeah. Technical report, Spotify, 2017.
T. R. Sumers, S. Yao, K. Narasimhan, and T. L. Griffiths. Cognitive architectures for language agents. arXiv preprint arXiv:2309.02427, 2023.
M. Summerfield. Programming in Python 3: A Complete Introduction to the Python Language. Developer's library. Addison-Wesley, 2010. ISBN 9780321680563.
J. J. Sun, M. Tjandrasuwita, A. Sehgal, A. Solar-Lezama, S. Chaudhuri, Y. Yue, and O. Costilla-Reyes. Neurosymbolic programming for science. arXiv preprint arXiv:2210.05050, 2022.
R. S. Sutton. Temporal Credit Assignment in Reinforcement Learning. PhD thesis, University of Massachusetts, Dept. of Comp. and Inf. Sci., 1984.
P. Swazinna, S. Udluft, D. Hein, and T. Runkler. Comparing model-free and model-based algorithms for offline reinforcement learning. arXiv preprint arXiv:2201.05433, 2022.
Z. Szabó, B. K. Sriperumbudur, B. Póczos, and A. Gretton. Learning theory for distribution regression. J. Mach. Learn. Res., 17(1):5272-5311, Jan 2016. ISSN 1532-4435.
R. Taori, I. Gulrajani, T. Zhang, Y. Dubois, X. Li, C. Guestrin, P. Liang, and T. B. Hashimoto. Stanford alpaca: An instruction-following llama model. https://github.com/tatsu-lab/stanford_alpaca, 2023.
H. Touvron, T. Lavril, G. Izacard, X. Martinet, M.-A. Lachaux, T. Lacroix, B. Rozière, N. Goyal, E. Hambro, F. Azhar, A. Rodriguez, A. Joulin, E. Grave, and G. Lample. Llama: Open and efficient foundation language models, 2023.
L. Tunstall, E. Beeching, N. Lambert, N. Rajani, K. Rasul, Y. Belkada, S. Huang, L. von Werra, C. Fourrier, N. Habib, N. Sarrazin, O. Sanseviero, A. M. Rush, and T. Wolf. Zephyr: Direct distillation of lm alignment, 2023.
A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin. Attention is all you need. In I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (eds.), Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017.
Ramakrishna Vedantam, C. Lawrence Zitnick, and Devi Parikh. Cider: Consensus-based image description evaluation. CoRR, abs/1411.5726, 2014. URL http://arxiv.org/abs/1411.5726.
P. Veličković and C. Blundell. Neural algorithmic reasoning. Patterns, 2(7):100273, 2021. ISSN 2666-3899. doi: https://doi.org/10.1016/j.patter.2021.100273.
B. Wang, Z. Wang, X. Wang, Y. Cao, R. A. Saurous, and Y. Kim. Grammar prompting for domain-specific language generation with large language models. arXiv preprint arXiv:2305.19234, 2023a.
J. Wang, X. Yi, R. Guo, H. Jin, P. Xu, S. Li, X. Wang, X. Guo, C. Li, X. Xu, et al. Milvus: A purpose-built vector data management system. In Proceedings of the 2021 International Conference on Management of Data, pp. 2614-2627, 2021a.
X. Wang, J. Wei, D. Schuurmans, Q. Le, E. Chi, S. Narang, A. Chowdhery, and D. Zhou. Self-consistency improves chain of thought reasoning in language models. arXiv preprint arXiv:2203.11171, 2023b.
Y. Wang, R.J. Skerry-Ryan, D. Stanton, Y. Wu, R. J. Weiss, N. Jaitly, Z. Yang, Y. Xiao, Z. Chen, S. Bengio, et al. Tacotron: Towards end-to-end speech synthesis. arXiv preprint arXiv:1703.10135, 2017.
Y. Wang, W. Wang, S. Joty, and S. C. H. Hoi. Codet5: Identifier-aware unified pre-trained encoder-decoder models for code understanding and generation. arXiv preprint arXiv:2109.00859, 2021b.
Y. Wang, Y. Kordi, S. Mishra, A. Liu, N. A. Smith, D. Khashabi, and H. Hajishirzi. Self-instruct: Aligning language model with self generated instructions. arXiv preprint arXiv:2212.10560, 2022.
J. Wei, Y. Tay, R. Bommasani, C. Raffel, B. Zoph, S. Borgeaud, D. Yogatama, M. Bosma, D. Zhou, D. Metzler, E. H. Chi, T. Hashimoto, O. Vinyals, P. Liang, J. Dean, and W. Fedus. Emergent abilities of large language models. arXiv preprint arXiv:2206.07682, 2022a.
J. Wei, X. Wang, D. Schuurmans, M. Bosma, B. Ichter, F. Xia, E. H. Chi, Q. V. Le, and D. Zhou. Chain of thought prompting elicits reasoning in large language models. In A. H. Oh, A. Agarwal, D. Belgrave, and K. Cho (eds.), Advances in Neural Information Processing Systems, 2022b.
Y. Weng, M. Zhu, F. Xia, B. Li, S. He, S. Liu, B. Sun, K. Liu, and J. Zhao. Large language models are better reasoners with self-verification. arXiv preprint arXiv:2212.09561, 2023.
A. N. Whitehead and B. Russell. Principia Mathematica. Cambridge University Press, 1925-1927.
M. Widrich, M. Hofmarcher, V. P. Patil, A. Bitto-Nemling, and S. Hochreiter. Modern Hopfield Networks for Return Decomposition for Delayed Rewards. In Deep RL Workshop NeurIPS 2021, 2021.
Writesonic. ChatGPT Alternative Built With Superpowers - ChatSonic. Technical report, Chatsonic, 2022. URL https://writesonic.com/chat.
C. Xu, D. Guo, N. Duan, and J. McAuley. Baize: An open-source chat model with parameter-efficient tuning on self-chat data, 2023.
Z. Xu, H. van Hasselt, and D. Silver. Meta-gradient reinforcement learning. ArXiv, 2018.
L. Yang, S. Zhang, Z. Yu, G. Bao, Y. Wang, J. Wang, R. Xu, W. Ye, X. Xie, W. Chen, and Y. Zhang. Supervised Knowledge Makes Large Language Models Better In-context Learners. arXiv preprint arXiv:2312.15918, 2023.
S. Yao, D. Yu, J. Zhao, I. Shafran, T. L. Griffiths, Y. Cao, and K. Narasimhan. Tree of thoughts: Deliberate problem solving with large language models. arXiv preprint arXiv:2305.10601, 2023a.
S. Yao, J. Zhao, D. Yu, N. Du, I. Shafran, K. Narasimhan, and Y. Cao. React: Synergizing reasoning and acting in language models. arXiv preprint arXiv:2210.03629, 2023b.
J. Ye, Z. Wu, J. Feng, T. Yu, and L. Kong. Compositional exemplars for in-context learning. arXiv preprint arXiv:2302.05698, 2023.
K. You, X. Wang, M. Long, and M. Jordan. Towards accurate model selection in deep unsupervised domain adaptation. In K. Chaudhuri and R. Salakhutdinov (eds.), Proceedings of the 36th International Conference on Machine Learning, volume 97 of Proceedings of Machine Learning Research, pp. 7124-7133. PMLR, Jun 9-15 2019.
YouWrite. The AI Search Engine You Control. Technical report, You.com, 2022. URL https://you.com.
D. Yu, B. Yang, D. Liu, H. Wang, and S. Pan. A survey on neural-symbolic learning systems. Neural Networks, 166: 105-126, 2023. ISSN 0893-6080.
W. Yuan, R. Y. Pang, K. Cho, S. Sukhbaatar, J. Xu, and J. Weston. Self-rewarding language models. arXiv preprint arXiv:2401.10020, 2024.
J. Zhang, B. Chen, L. Zhang, X. Ke, and H. Ding. Neural, symbolic and neural-symbolic reasoning on knowledge graphs. AIOpen, pp. 14-35, 2021.
M. Zhuge, W. Wang, L. Kirsch, F. Faccio, D. Khizbullin, and J. Schmidhuber. Language agents as optimizable graphs, 2024.

A BROAD AI AND NEURO-SYMBOLIC SYSTEMS
广泛的人工智能与神经符号系统

Our work focuses on broad AI (Hochreiter, 2022) (see Figure 7) through the integration of symbolic and sub-symbolic AI methodologies. Broad AI extends beyond restricted focus on single-task performance of narrow AI. In broad AI, systems are engineered to handle a range of tasks with a high degree of autonomy, utilizing sensory input, accumulated experiences, and previously developed skills.
我们的工作集中在广泛的人工智能(Hochreiter, 2022)(见图7),通过整合符号和子符号人工智能方法。广泛的人工智能超越了对狭义人工智能单一任务表现的限制关注。在广泛的人工智能中,系统被设计为能够处理一系列任务,具有高度的自主性,利用感官输入、累积的经验和先前开发的技能。
Figure 7: Hierarchical model of "cognitive" abilities of AI systems (Chollet, 2019; Hochreiter, 2022). The figure of capabilities, such as knowledge transfer, interaction, adaptability, robustness, abstraction, advanced reasoning, and efficiency. Broad AI aims to mimic human cognitive adaptability and robustness through advanced methodologies like few-shot learning, self-supervised contrastive learning, and context-sensitive sensory processing. Notably, Broad AI applies principles such as conceptual short-term memory and modern Hopfield networks (Ramsauer et al., 2020) to better integrate context and memory, thus avoiding pitfalls like explaining away and short-cut learning. We acknowledge the potential of NeSy systems as a significant step towards AI systems capable of performing any cognitive task with human-like proficiency.
图7:人工智能系统“认知”能力的层次模型(Chollet, 2019;Hochreiter, 2022)。该图展示了知识转移、互动、适应性、鲁棒性、抽象、高级推理和效率等能力。广义人工智能旨在通过先进的方法,如少样本学习、自监督对比学习和上下文敏感的感知处理,模拟人类的认知适应性和鲁棒性。值得注意的是,广义人工智能应用了概念短期记忆和现代霍普菲尔德网络(Ramsauer et al., 2020)等原理,以更好地整合上下文和记忆,从而避免解释偏差和捷径学习等陷阱。我们承认NeSy系统的潜力是朝着能够以类人水平执行任何认知任务的人工智能系统迈出的重要一步。
NeSy methods form the basis for developing new cognitive architectures (Newell & Simon, 1956; Newell et al., 1957; Newell & Simon, 1972; Newell, 1990; Langley et al., 2009; Laird, 2022; Dawid & LeCun, 2023; Sumers et al., 2023; LeCun, 2022; Assran et al., 2023). This hybridization produces computational graphs capable of context-aware learning and reasoning, allowing AI to execute complex tasks with human-like flexibility.
NeSy方法为开发新的认知架构奠定了基础(Newell & Simon, 1956;Newell et al., 1957;Newell & Simon, 1972;Newell, 1990;Langley et al., 2009;Laird, 2022;Dawid & LeCun, 2023;Sumers et al., 2023;LeCun, 2022;Assran et al., 2023)。这种混合化产生了能够进行上下文感知学习和推理的计算图,使人工智能能够以类人灵活性执行复杂任务。
Borrowing nomenclature from Karpathy (2017); Dilhara et al. (2021), we refer to the next generation of software as Software 3.0, which consists of applications that are neither pre-determined at design time, nor learned through to neuromorphic circuits (Indiveri et al., 2011), however, purely established at inference time in the "thought" process of a NeSy system.
借用Karpathy(2017);Dilhara等(2021)的术语,我们将下一代软件称为软件3.0,它由在设计时既未预先确定,也未通过神经形态电路(Indiveri et al., 2011)学习而完全在NeSy系统的“思维”过程中建立的应用程序组成。
To enable such systems, we require a more native integration (see illustration in Figure 9) of probabilistic programming paradigms into our contemporary programming stack, and make their utilization a commodity for practitioners and researchers alike.
为了实现这样的系统,我们需要将概率编程范式更本土化地整合(见图9中的插图)到我们当代的编程栈中,并使其成为从业者和研究人员都能轻松使用的商品。

A.1 BROADER IMPACT
A.1 更广泛的影响

With LLMs becoming more and more accessible, progress recently made possible by the vast open source contributions from Köpf et al. (2023); Touvron et al. (2023); Taori et al. (2023); Xu et al. (2023); Geng et al. (2023); Biderman et al. (2023), embedded accelerators for LLMs - or more generally NeSY engines - will be ubiquitous in future computation platforms, such as wearables, smartphones, tablets, consoles, or notebooks. Although current execution cycles are slow and error-prone, we expect to see further performance gains through improved operating system level optimizations, dedicated GPU-centric hardware refinement, and improved software interoperability. We believe that features for constructing and evaluating generative computational graphs. This includes but is not limited to compo-
随着大型语言模型(LLMs)变得越来越可及,最近的进展得益于Köpf等人(2023);Touvron等人(2023);Taori等人(2023);Xu等人(2023);Geng等人(2023);Biderman等人(2023)的广泛开源贡献,LLMs的嵌入式加速器——或更一般的NeSY引擎——将在未来的计算平台中无处不在,例如可穿戴设备、智能手机、平板电脑、游戏机或笔记本电脑。尽管当前的执行周期缓慢且容易出错,但我们预计通过改进操作系统级别的优化、专用的以GPU为中心的硬件改进和改进的软件互操作性,将看到进一步的性能提升。我们相信构建和评估生成计算图的特性。这包括但不限于组合、可并行化和基于仿真的执行,具有多态操作和自指结构。在生成过程中,当前的编程语言通常在这些概念上存在不连贯或临时的解决方案。我们相信将这些概念的整体概率支持整合到现代软件和硬件中,将解锁新的编程范式,使其能够充分利用生成架构。我们希望社区将这些想法视为当代计算的重要组成部分。
Figure 8: Evolution of software paradigms: From Software 1.0's rigid specification in classical programming to Software 2.0's data-driven and objective function-focused differentiable programming, leading to Software 3.0's NeSy systems that emphasize human-centric, interaction-based programming with computational graphs. This progression represents a shift from explicit task-specific programming to abstract, adaptive systems that cater to dynamic user preferences.
图8:软件范式的演变:从软件1.0在经典编程中的严格规范,到软件2.0以数据驱动和目标函数为中心的可微编程,再到软件3.0的NeSy系统,强调以人为中心的、基于交互的编程与计算图。这一进程代表了从显式任务特定编程到抽象、自适应系统的转变,这些系统迎合动态用户偏好。
sitional, parallelizable, and simulation-based executions with polymorphic operations and self-referential structures. Current programming languages often have disjointed or makeshift solutions for these concepts in the context of generative processes. We believe integral probabilistic support for these concepts into modern software and hardware will unlock new programming paradigms that can fully take advantage of generative architectures. We hope the community will consider these ideas as essential components of contemporary computing.
组合、可并行化和基于仿真的执行,具有多态操作和自指结构。在生成过程中,当前的编程语言通常在这些概念上存在不连贯或临时的解决方案。我们相信将这些概念的整体概率支持整合到现代软件和硬件中,将解锁新的编程范式,使其能够充分利用生成架构。我们希望社区将这些想法视为当代计算的重要组成部分。
We also expect to see significant progress by processing central language concepts through system-on-a-chip (SoC) solutions of pre-trained models, with linear probing layers for hot-swappable weight exchange of task-specific projections and executions. A wide range of functionalities can be then offloaded to probabilistic programming languages to operate on dedicated symbols and streamline the vector-valued mappings between the concept space and underlying problem space, avoiding defining boilerplate code to load and unload network weights.
我们还期望通过系统级芯片(SoC)解决方案处理中心语言概念,利用预训练模型的线性探测层实现任务特定投影和执行的热插拔权重交换,从而取得显著进展。然后,可以将广泛的功能卸载到概率编程语言,以在专用符号上操作,并简化概念空间与基础问题空间之间的向量值映射,避免定义加载和卸载网络权重的样板代码。
Furthermore, we believe that many gains in representational stability and consistency may be obtained through multimodal data training and improved alignment based on operator learning oriented functionalities and workflow-related scoring functionalities, analogous to our introduced quality measure. Gains in representational stability also benefit
此外,我们相信,通过多模态数据训练和基于操作学习导向的功能及工作流相关评分功能的改进对齐,可以获得许多代表性稳定性和一致性的提升,这类似于我们引入的质量度量。代表性稳定性的提升也有助于
Figure 9: The illustration demonstrates the integration of Neuro-Symbolic computation within the contemporary programming stack. Probabilistic programming paradigms are embedded natively alongside traditional languages and environments, facilitated by interfaces to various data sources, compute engines, and human interaction tools, streamlining their adoption in practical and research applications.
图9:该插图展示了神经符号计算在现代编程栈中的整合。概率编程范式与传统语言和环境原生嵌入,借助于与各种数据源、计算引擎和人机交互工具的接口,简化了它们在实际和研究应用中的采用。
self-instruction and self-referential sub-process evaluations, which enable the dynamic creation and evaluation of acquisition of complex concepts with only one or few examples at inference time. We believe this will enable the creation of autonomously self-evolving cognitive architectures (Langley et al., 2009; Dawid & LeCun, 2023; Sumers et al., 2023). We therefore see an inherent connection to generative design as an analogy for creating coherent and stable "thought" computational graphs, and believe this paves the path toward broad AI systems (see Section A) and is a requirement for developing artificial general intelligent agents.
自我指导和自我参考的子过程评估,使得在推理时仅用一个或少数几个示例即可动态创建和评估复杂概念的获取。我们相信这将使得自主自我演化的认知架构的创建成为可能(Langley et al., 2009; Dawid & LeCun, 2023; Sumers et al., 2023)。因此,我们看到生成设计与创建连贯且稳定的“思想”计算图之间存在内在联系,并相信这为广泛的人工智能系统铺平了道路(见A节),并且是开发人工通用智能代理的必要条件。
Finally, we also wish to express our concern about recent economic trends in the deep-tech industry, where we observe AI-related concentration of data and resources, coupled with a tendency towards closed-source practices. We strongly advocate for increased transparency and exchange of ideas to ensure a diverse and collective growth in our socioeconomic landscape. Therefore, we push towards a democratic and open-source initiative.
最后,我们还希望表达对深科技行业近期经济趋势的关注,在这里我们观察到与人工智能相关的数据和资源集中,以及向闭源实践的倾斜。我们强烈倡导增加透明度和思想交流,以确保我们社会经济环境中的多样性和集体增长。因此,我们推动民主和开源倡议。

B CONNECTION BETWEEN FRÉCHET DISTANCE AND MAXIMUM MEAN DISCREPANCY
B 弗雷歇距离与最大均值差异的关系

Let us consider a Gaussian kernel defined by the expression
让我们考虑由以下表达式定义的高斯核
(5)K(x,y)=exp(xy22σ2),
where σ is the bandwidth parameter of the kernel and denotes the Euclidean norm. Utilizing K ,we can now construct a measure of distance between distributions, by embedding them into the Reproducing Kernel Hilbert Space (RKHS) induced by K ,using kernel mean embeddings. The resulting distance is called the Maximum Mean Discrepancy (MMD).
其中 σ 是核的带宽参数, 表示欧几里得范数。利用 K,我们现在可以通过将分布嵌入到由 K 诱导的再生核希尔伯特空间(RKHS)中,使用核均值嵌入来构建分布之间的距离度量。得到的距离称为最大均值差异(MMD)。
More precisely,the MMD between two probability distributions P and Q is encoded in the RKHS through mean embeddings, which can be expressed as
更确切地说,两个概率分布 PQ 之间的 MMD 通过均值嵌入在 RKHS 中编码,可以表示为
(6)MMD2(P,Q)=ExP[ϕ(x)]EyQ[ϕ(y)]RKHS2,
where ϕ() represents the feature mapping to the RKHS corresponding to the kernel K .
其中 ϕ() 表示与核 K 对应的 RKHS 的特征映射。
On the other hand, for multivariate Gaussian distributions, we can use the Fréchet distance as a measure of similarity, which is nothing but the associated Wasserstein-2 distance, for which an explicit formula is available in the Gaussian case. The resulting expression is as follows (Dowson & Landau, 1982):
另一方面,对于多元高斯分布,我们可以使用弗雷歇距离作为相似性的度量,这实际上就是相关的 Wasserstein-2 距离,对于高斯情况有明确的公式可用。得到的表达式如下(Dowson & Landau, 1982):
(7)d2(X1,X2)=μ1μ222+Tr(C1+C22(C1C2)12),
where X1N(μ1,C1) and X2N(μ2,C2) ,and Tr() indicates the trace of a matrix.
其中 X1N(μ1,C1)X2N(μ2,C2),而 Tr() 表示矩阵的迹。
To establish an approximation of the Fréchet distance using the Gaussian kernel,we take C1=σ2I and C2=σ2I as identity covariance matrices scaled by σ2 . This assumption allows us to focus solely on the disparities in mean vectors:
为了使用高斯核建立弗雷歇距离的近似,我们将 C1=σ2IC2=σ2I 视为由 σ2 缩放的单位协方差矩阵。这个假设使我们能够专注于均值向量之间的差异:
(8)d2(X1,X2)μ1μ222,
setting aside the effect of different covariance structures.
忽略不同协方差结构的影响。
Given these conditions, we attempt to argue that the Fréchet distance behaves similarly as MMD:
在这些条件下,我们试图论证 Fréchet 距离的行为与 MMD 类似:
(9)d2(X1,X2)μ1μ222MMD2(P,Q),
Heuristically,at least for small μ1μ2 ,also the associated kernel evaluations K(X1,X2) tend to be small (see also Hochreiter &Schmidhuber (1997)),which leads to a small MMD,if we ignore the terms associated to K(X1,X1) , K(X2,X2) (which cancel out due to same covariance structure).
从启发式上讲,至少对于小的 μ1μ2 ,相关的核评估 K(X1,X2) 也倾向于较小(参见 Hochreiter & Schmidhuber (1997)),这导致 MMD 较小,如果我们忽略与 K(X1,X1)K(X2,X2) 相关的项(由于相同的协方差结构而相互抵消)。
In the next section, we want to further elaborate on the MMD and a possible score, that can be derived from it.
在下一节中,我们希望进一步阐述 MMD 及其可能衍生的评分。

B.1 Extended Simplification of the MMD Calculation
B.1 MMD 计算的扩展简化

To understand the simplification of the MMD, we are formally expressing the MMD in terms of kernel sums over pairs of samples within and across two samples X and Y :
为了理解 MMD 的简化,我们正式将 MMD 表达为在两个样本 XY 内部及跨样本的样本对的核和:
(10)MMD2(X,Y)=1m(m1)ijik(xi,xj)2mni=1mj=1nk(xi,yj)+1n(n1)ijik(yi,yj),
where m and n are the sizes of samples X and Y ,respectively.
其中 mn 分别是样本 XY 的大小。
Empirical observations have led to the conclusion that the within-sample terms ijik(xi,xj) and ijik(yi,yj) cancel out the cross terms i=1mj=1nk(xi,yj) under certain conditions. This can be due to the following:
实证观察得出结论,在某些条件下,样本内的项 ijik(xi,xj)ijik(yi,yj) 会抵消交叉项 i=1mj=1nk(xi,yj) 。这可能是由于以下原因:
  • In high-dimensional embedding spaces, distributions of embedding vectors are often closely related and normally distributed.
  • 在高维嵌入空间中,嵌入向量的分布通常密切相关且呈正态分布。
  • If the samples X and Y are drawn from distributions P and Q where their mean embeddings are nearly orthogonal in the RKHS, it is the dissimilarity across samples, rather than that within, that is most relevant.
  • 如果样本 XY 是从分布 PQ 中抽取的,而它们的均值嵌入在 RKHS 中几乎正交,那么样本之间的差异性,而不是样本内的差异性,才是最相关的。
Therefore, under these specific conditions, it becomes justifiable to focus on the cross-terms, yielding the following proposal for a distance measure:
因此,在这些特定条件下,专注于交叉项是合理的,从而提出以下距离度量的建议:
(11)MMD2~(X,Y)2mni=1mj=1nk(xi,yj).

C STRUCTURE
C 结构

Primitives In the SymbolicAI framework, at the core lies the concept of Primitives and the dynamic type creation of Symbol objects, which are central to inherit types of behaviors. Primitives are pre-defined operations that act on Symbol objects, encapsulating basic operations, such as arithmetic, logic, or casting operations, to name a few. These operations are crucial to the framework's versatility and form the foundation for more complex interactions within computational graphs. Essentially, they can be viewed as contextualized functions that accept a Symbol object, send it to the NeSy engine for evaluation, and return one or more new objects, primarily new symbols. One of the key features of operations is their polymorphism, which allows for them to be applied to various data types, such as strings, integers, floats, lists, and more, with different behaviors depending on the specific object instance. To execute operations, we utilize the Symbol object's value attribute containing the original data type. This will be then sent as a string representation to the engines to execute the needed operations. Consequently, all values are cast to a string representation. Remember, this was our implicit assumption (see Section 4). For custom objects, it is semantics.
在符号人工智能框架中,核心概念是原语及符号对象的动态类型创建,这些对象是行为继承类型的核心。原语是预定义的操作,作用于符号对象,封装了基本操作,如算术、逻辑或类型转换操作等。这些操作对于框架的多功能性至关重要,并构成了计算图中更复杂交互的基础。本质上,它们可以被视为上下文化的函数,接受一个符号对象,将其发送到NeSy引擎进行评估,并返回一个或多个新对象,主要是新符号。操作的一个关键特性是其多态性,这使得它们可以应用于各种数据类型,如字符串、整数、浮点数、列表等,具体行为取决于特定对象实例。为了执行操作,我们利用符号对象的值属性,该属性包含原始数据类型。然后,这将作为字符串表示发送给引擎以执行所需的操作。因此,所有值都被转换为字符串表示。请记住,这是我们的隐含假设(见第4节)。对于自定义对象,它是语义的。
Symbol Objects Creation and Dynamic Typing A Symbol object is a versatile entity that can encapsulate a variety of data types and behaviors. The creation of Symbol objects is facilitated through a metaclass, which enables the dynamic typing of these objects to inherit behaviors from a collection of primitives. This dynamic typing system is important for extending the functionality of Symbol objects beyond simple data containers; they contain specific behaviors appropriate for the operations they will perform. For instance, a Symb01 object may possess the behaviors of arithmetic computations, string manipulations, or even logical comparisons, depending on the defined primitives.
符号对象的创建和动态类型化 符号对象是一个多功能实体,可以封装多种数据类型和行为。符号对象的创建通过一个 metaclass 来实现,该 metaclass 使这些对象的动态类型化能够从一组原语中继承行为。这种动态类型系统对于扩展符号对象的功能至关重要,超越简单的数据容器;它们包含适合于将要执行的操作的特定行为。例如,Symb01 对象可能具有算术计算、字符串操作或甚至逻辑比较的行为,具体取决于定义的原语。
Type Inheritance and Expression Creation Type inheritance in SymbolicAI is leveraged to create new expressions, which are specialized forms of Symb 1 objects designed to represent parts of a computational graph. Expressions extend the capabilities of Symbol objects by providing a structured way to define complex functionalities that can inherits the properties of Symbol objects while also being able to define its own unique behavior through a forward method, which is analogous to a computational graph node's evaluation function. Figure 10 gives an overview of an exemplary inheritance branch. Each Expression must feature a forward method, which must be overwritten to define its behavior. The inherited ...call ... method invokes the forward method, evaluating the expression and returning the result. This design pattern facilitates lazy evaluation of expressions, allowing for complex composition of expressions.
类型继承与表达式创建 在SymbolicAI中,类型继承被用来创建新的表达式,这些表达式是专门设计的Symb 1 对象的特殊形式,用于表示计算图的部分。表达式通过提供一种结构化的方式来定义复杂功能,扩展了Symbol对象的能力,这些功能可以继承Symbol对象的属性,同时能够通过前向方法定义其独特的行为,这类似于计算图节点的评估函数。图10概述了一个示例继承分支。每个表达式必须具有一个前向方法,该方法必须被重写以定义其行为。继承的...call...方法调用前向方法,评估表达式并返回结果。这种设计模式促进了表达式的惰性评估,允许复杂的表达式组合。
Inherited from the Symbol class, the _sym_return_type and static_context properties establish the context in which the current Expression operates. The static_context impacts all operations of the current Expression subclass, while the _sym_return_type guarantees the acquisition of the desired return object type post-evaluation. Typically, this returns the current type but can be configured to return a different type. A more in-depth examination of both notions will be provided in the following section.
从Symbol类继承的_sym_return_type和static_context属性建立了当前表达式操作的上下文。static_context影响当前表达式子类的所有操作,而_sym_return_type则保证在评估后获得所需的返回对象类型。通常,这返回当前类型,但可以配置为返回不同的类型。对这两个概念的更深入的探讨将在下一节中提供。
Figure 10: Class diagram showing the inheritance and composition relationships among Symbol, Expression, and other inherited classes. Symbol serves as a base class for Expression where all the other types are derived from. Other types may contain or associate with zero or more Symbol types. For example, we illustrate how the Sequence derives from Expression and the multiplicity ’ 0 ’ indicates that a Sequence can contain any number of Expression instances.
图10:类图显示了Symbol、Expression及其他继承类之间的继承和组合关系。Symbol作为Expression的基类,所有其他类型均从其派生。其他类型可以包含或关联零个或多个Symbol类型。例如,我们展示了Sequence如何从Expression派生,而多重性’0’表示Sequence可以包含任意数量的Expression实例。
Utilizing Decorators for Operation Definition Decorators serve as a bridge between the declarative nature of symbolic operations and the imperative execution model of programming languages. By augmenting function definitions with decorators, the framework can dynamically assign operations to Symbol or Expression objects, which are then interpreted by the underlying NeSy engine or traditional solvers.
利用装饰器进行操作定义 装饰器作为符号操作的声明性特征与编程语言的命令式执行模型之间的桥梁。通过用装饰器增强函数定义,框架可以动态地将操作分配给Symbol或Expression对象,这些对象随后由底层的NeSy引擎或传统求解器进行解释。
For example, the @core. logic decorator can be used to augment a Symbol object with the capability to perform logical and, or, or not operations contextually. Similarly, the @core . combine decorator allows the framework to define the semantics of combining or adding two symbolic values, regardless of their underlying data representations.
例如,@core.logic 装饰器可以用于增强 Symbol 对象,使其具备在上下文中执行逻辑与、或或非操作的能力。同样,@core.combine 装饰器允许框架定义组合或添加两个符号值的语义,而不考虑它们底层的数据表示。

! # Example of using decorators to define logical operations
2 @core.logic (operator='and')
3 def _some_logic(self, other):
5 pass

Aspect-Oriented Programming The aspect-oriented programming paradigm offers a functional approach for extending or modifying the behavior of functions or methods without altering their code directly. This adheres to the principles of modularity and separation of concerns, as it allows for the isolation of specific functionalities while maintaining the original function's core purpose. By wrapping the original function, decorators provide an efficient and reusable way of adding or modifying behaviors. For instance, SymbolicAI integrates the zero- and few-shot learning with default fallback functionalities of pre-existing code.
面向切面编程 面向切面编程范式提供了一种功能性的方法,用于扩展或修改函数或方法的行为,而无需直接更改其代码。这遵循模块化和关注点分离的原则,因为它允许在保持原始函数核心目的的同时,隔离特定功能。通过包装原始函数,装饰器提供了一种高效且可重用的方式来添加或修改行为。例如,SymbolicAI 将零样本和少样本学习与现有代码的默认回退功能集成在一起。
Decorators brings several advantages (Beazley, 2009; Martelli et al., 2005; Summerfield, 2010; Lutz, 2013):
装饰器带来了几个优势(Beazley, 2009; Martelli et al., 2005; Summerfield, 2010; Lutz, 2013):
  • Reusability: Decorators promote code modularity, enhancing code reusability and contributing to software maintainability. This advantage is particularly salient when managing a variety of operations, reducing redundancy and simplifying the integration of new functionalities.
  • 可重用性:装饰器促进了代码模块化,增强了代码的可重用性,并有助于软件的可维护性。当管理多种操作时,这一优势尤为明显,减少了冗余并简化了新功能的集成。
  • Composition: Decorators support function composition, allowing developers to construct complex functionalities from pre-existing code blocks without the need to expand the codebase or rely on complex inheritance hierarchies.
  • 组合性:装饰器支持函数组合,允许开发者从现有代码块构建复杂功能,而无需扩展代码库或依赖复杂的继承层次结构。
  • Adaptability: Through decorators we can easily modify or extend the behavior of operations without changing their core implementation. This flexibility facilitates the generation of adaptive workflows and reliable fallback mechanisms when experimental implementations do not fulfill required constraints.
  • 适应性:通过装饰器,我们可以轻松修改或扩展操作的行为,而无需更改其核心实现。这种灵活性促进了自适应工作流的生成,以及在实验实现未满足所需约束时的可靠回退机制。
Symbol Class and Computational Graph Elements A computational graph in the SymbolicAI framework is an assembly of interconnected Symbol objects, each encapsulating a unit of data and the operations that can be performed complex workflows.
Symbol 类和计算图元素 在 SymbolicAI 框架中,计算图是一个由相互连接的 Symbol 对象组成的集合,每个对象封装了一单位数据及其可执行的操作,形成复杂的工作流。
The Symbol class is an abstraction representing data and context. It holds not only the value itself, but metadata that guides its transformation and interpretation. Through inheritance and compositionality, the Symbol can be extended contain a reference to its parent and children, naturally forming a directed graph structure where the nodes are symbols and edges represent relationships between a symbol and its derivative computations.
Symbol 类是一个表示数据和上下文的抽象。它不仅包含值本身,还包含指导其转换和解释的元数据。通过继承和组合,Symbol 可以扩展以包含对其父节点和子节点的引用,自然形成一个有向图结构,其中节点是符号,边表示符号与其派生计算之间的关系。
The Linker class, is a metadata subclass, and tracks relationships and results, effectively annotating the graph with execution details. It keeps records of nodes' keys, allowing quick retrieval of related computational outcomes within the graph, and aids in tasks such as error tracing and debugging.
Linker 类是一个元数据子类,跟踪关系和结果,有效地为图形注释执行细节。它记录节点的键,允许快速检索图中相关的计算结果,并有助于错误追踪和调试等任务。
A central concept in this structure is the notion of root, which points to the origin of the computational sequence. Accessing the root allows backtracking through the graph, making it possible to aggregate results and inspect the flow of computation that led to the current node.
该结构中的一个核心概念是根的概念,它指向计算序列的起点。访问根节点允许在图中回溯,使得聚合结果和检查导致当前节点的计算流程成为可能。
The computational graph's structure is further enriched by properties like nodes, edges, and graph itself, which collectively enable the comprehensive query of the computation's topology. These properties are used internally to enable graph visualizations, which are useful for debugging and analysis.
计算图的结构通过节点、边和图本身等属性进一步丰富,这些属性共同使得对计算拓扑的全面查询成为可能。这些属性在内部用于启用图形可视化,这对于调试和分析非常有用。
Expression of a Computational Graph In practice, consider the Expression class, which extends the functionality of the Symbol class. When composing a Sequence of Expression objects, we are effectively composing operations in a predetermined order.
计算图的表达 在实践中,考虑 Expression 类,它扩展了 Symbol 类的功能。当我们组合一系列 Expression 对象时,实际上是在以预定顺序组合操作。
For instance, an expression like: Sequence ( Clean( ), Translate ( ), Outline ( ),
例如,像这样的表达式:Sequence ( Clean( ), Translate ( ), Outline ( ),
represents a procedure that first cleans data, then translates it, outlines the essential information, and composes it into receiving the output of its predecessor as input and passing its result to the successor.
表示一个过程,首先清理数据,然后翻译它,概述基本信息,并将其组合为接收前驱的输出作为输入,并将其结果传递给后继。

D INSTALLATION
D 安装

The installation of the SymbolicAI framework is straightforward and can be done through the Python package manager pip. To install SymbolicAI, open a terminal and execute the following command in your current python environment:
SymbolicAI 框架的安装非常简单,可以通过 Python 包管理器 pip 完成。要安装 SymbolicAI,请打开终端并在当前 Python 环境中执行以下命令:
pip install symbolicai
This command will install the latest version of SymbolicAI along with its core dependencies, enabling the integration of the framework into Python applications. If you intend to utilize the framework with local engines 5 ,or with engines powered by external APIs such as OpenAI's API, additional installation steps are required.
此命令将安装最新版本的 SymbolicAI 及其核心依赖项,使该框架能够集成到 Python 应用程序中。如果您打算将该框架与本地引擎 5 或由外部 API(如 OpenAI 的 API)提供支持的引擎一起使用,则需要额外的安装步骤。

D.1 ENGINE CONFIGURATION
D.1 引擎配置

Before the first run, it is necessary to configure the required modules and optionally set necessary API keys to activate the respective engines. This can be done in multiple ways, but we recommend doing it through the configuration wizard by running this command in the terminal:
在第一次运行之前,需要配置所需的模块,并可选地设置必要的 API 密钥以激活相应的引擎。这可以通过多种方式完成,但我们建议通过在终端中运行以下命令来使用配置向导:
1 symwzd
This step is essential to register the engines internally for subsequent runs.
此步骤对于在后续运行中内部注册引擎至关重要。
For instance, SymbolicAI includes OpenAI's GPT models as NeSy engine. To only set or change OpenAI API keys, the following command is used before starting a SymbolicAI instance: I # Linux / MacOS 2 export OPENAI_API_KEY=""
例如,SymbolicAI 将 OpenAI 的 GPT 模型作为 NeSy 引擎。要仅设置或更改 OpenAI API 密钥,可以在启动 SymbolicAI 实例之前使用以下命令: I # Linux / MacOS 2 export OPENAI_API_KEY=""
After setting up the API keys, the SymbolicAI library is imported in Python as following: 1 import symai
设置 API 密钥后,SymbolicAI 库在 Python 中的导入方式如下: 1 import symai
For more low-level changes,we store everything under the $HOME/ . symai folder,such as the symai. config. json, which stores every key, both registered and not registered.
对于更低级别的更改,我们将所有内容存储在 $HOME/ . symai 文件夹下,例如 symai.config.json,该文件存储每个密钥,包括已注册和未注册的密钥。

D.2 OPTIONAL INSTALLATIONS
D.2 可选安装

The SymbolicAI framework is designed to leverage multiple engines for a variety of operations. To fully utilize these capabilities, you may install additional dependencies or set up the optional API keys for specific engines like WolframAlpha, SerpApi, and others. In Figure 11 we conceptually outline the connection between the utilization of an LLM and its interact with other tools and solvers. Instructions and operations can be initiated by any user, pre-scripted knowledge base or learned meta agent.
SymbolicAI 框架旨在利用多个引擎进行各种操作。为了充分利用这些功能,您可以安装额外的依赖项或为特定引擎(如 WolframAlpha、SerpApi 等)设置可选的 API 密钥。在图 11 中,我们概念性地概述了 LLM 的使用与其与其他工具和求解器的交互之间的连接。指令和操作可以由任何用户、预先编写的知识库或学习的元代理发起。
For instructions on additional installations, including the support of optional engines, refer to the documentation provided with the framework. This documentation will give detailed steps on installing optional dependencies and configuring additional API keys.
有关其他安装的说明,包括可选引擎的支持,请参阅与框架一起提供的文档。该文档将详细说明安装可选依赖项和配置额外 API 密钥的步骤。

E IMPLEMENTATION DETAILS
E 实现细节

Let us now define some Symbol objects and perform some basic manipulations.
现在让我们定义一些符号对象并进行一些基本操作。

E.1 FUZZY COMPARISON
E.1 模糊比较

For instance,let’s consider fuzzy 6 comparisons. Within SymbolicAI,it enables more adaptable and context-aware evaluations, accommodating the inherent uncertainties and variances often encountered in real-world data.
例如,让我们考虑模糊 6 比较。在 SymbolicAI 中,它能够进行更灵活和上下文感知的评估,适应现实数据中常见的固有不确定性和变异性。
1 import numpy
3s= symai. Symbol (3.1415)
3s= symai. 符号 (3.1415)
4 s == numpy.pi

5 The local engines are experimental and are run on your local machine. For more details,refer to the "Local Neuro-Symbolic
5 本地引擎是实验性的,并在您的本地机器上运行。有关更多详细信息,请参阅“本地神经符号
6 Not related to fuzzy logic,which is a topic under active consideration.
6 与模糊逻辑无关,模糊逻辑是一个正在积极考虑的主题。

Figure 11: The SymbolicAI framework integrates a Large Language Model (LLM) with diverse tools and solvers through a conceptual interaction stack. The framework enables operations initiated by users, knowledge bases, or Whisper via conditions and constraints, enhancing the AI's problem-solving capabilities.
图 11:SymbolicAI 框架通过概念交互堆栈将大型语言模型(LLM)与多种工具和求解器集成。该框架使用户、知识库或 Whisper 通过条件和约束发起的操作成为可能,从而增强了 AI 的问题解决能力。
1 : [Output] :
1 : [输出] :
2 True
2 真

E.2 DYNAMIC CASTING
E.2 动态转换

By enabling sentence subtraction and dynamic casting within SymbolicAI, we utilize the generalization capability of NeSy engines to manipulate and refine text-based data, creating more meaningful and contextually relevant outcomes. The integration of dynamic casting with Symbol objects in our API allows the users to perform operations between Symbol objects and various data types, such as strings, integers, floats, lists, etc. without compromising on readability or simplicity.
通过在 SymbolicAI 中启用句子减法和动态转换,我们利用 NeSy 引擎的泛化能力来操作和精炼基于文本的数据,从而创建更有意义和上下文相关的结果。动态转换与我们 API 中的符号对象的结合使用户能够在符号对象与各种数据类型(如字符串、整数、浮点数、列表等)之间执行操作,而不影响可读性或简洁性。
is = symai.Symbol ('Hello my enemy')
is = symai.Symbol ('你好,我的敌人')
2 s - 'enemy' + 'friend'
2 s - '敌人' + '朋友'
1 : [Output] :
1 : [输出] :
2 (value=Hello my friend)
2 (value=你好,我的朋友)

E.3 SYMBOLS AND EMBEDDINGS
E.3 符号与嵌入

It is worth noting that encoding a complex object into a string sometimes precludes the object reconstitution. However, this concern does not substantially impede our methodology as we can employ approximations or proxy representations stored by the vector-valued property to effectively re-map objects. These representations are obtained through respective embedding models.
值得注意的是,将复杂对象编码为字符串有时会妨碍对象的重构。然而,这一问题并未实质性地妨碍我们的方法论,因为我们可以使用近似或由向量值属性存储的代理表示来有效地重新映射对象。这些表示是通过各自的嵌入模型获得的。

E.4 TRANSLATION
E.4 翻译

In today's increasingly interconnected world, translation between languages is fundamental, making it an essential feature.
在当今日益互联的世界中,语言之间的翻译是基础性的,成为一项必不可少的功能。
1 s = symai.Symbol ("Welcome to our tutorial.")
1 s = symai.Symbol ("欢迎来到我们的教程。")
2 s. translate ('German')
2 s. translate ('德语')
1 : [Output] :
1 : [输出] :
2 (value=Willkommen zu unserem Tutorial.)
2 (value=Willkommen zu unserem Tutorial.)

E.5 FILTERING, RANKING, EXTRACTION
E.5 过滤、排序、提取

Incorporating data-agnostic operations like filtering, ranking, and pattern extraction into our API allow the users to easily manipulate and analyze diverse data sets.
将与数据无关的操作如过滤、排序和模式提取纳入我们的 API,使用户能够轻松操作和分析多样的数据集。
s = symai.Symbol (numpy.array([1,2,3,4,5,6,7]))
2 s.rank(measure='numerical', order='descending') 1 : [Output] :
2 s.rank(measure='数值', order='降序') 1 : [输出] :
2 (value=['7', '6', '5', '4', '3', '2', '1'])

E.6 IMPLICATIONS
E.6 含义

One of the main objectives behind developing SymbolicAI was to facilitate reasoning capabilities in conjunction with the statistical inference inherent in LLMs. Consequently, we can carry out deductive reasoning operations through the Symbol objects. For instance, it is feasible to establish a series of operations with rules delineating the causal relationship between two symbols. The subsequent example illustrates the utilization of the & operator to compute the logical implication derived from the interaction of two distinct symbols.
开发 SymbolicAI 的主要目标之一是促进推理能力与 LLMs 中固有的统计推断相结合。因此,我们可以通过 Symbol 对象进行演绎推理操作。例如,可以建立一系列操作,规则描述两个符号之间的因果关系。下面的示例说明了如何利用 & 运算符计算由两个不同符号的交互所产生的逻辑蕴涵。
: s1 = symai.Symbol('The horn only sounds on Sundays.')
s1 = symai.Symbol('喇叭只在星期天响。')
2 s2 = symai.Symbol('I hear the horn.')
s2 = symai.Symbol('我听到喇叭声。')
3 S1 & S2
1 : [Output] :
1 : [输出] :
2 (value=It is Sunday.)
2 (value=今天是星期天。)
In the above example, the & operator overloads the and logical operator and extends its functionality. Furthermore, we can establish more sophisticated logical operators for and, or, and xor that can be grounded in formal proofs and utilize the NeSy engine to parse data structures before evaluating the expressions. This enables the definition of bespoke operations for executing intricate and robust logical operations, incorporating constraints to validate outcomes and guide the computation towards the desired behavior.
在上述示例中,& 运算符重载了逻辑与运算符并扩展了其功能。此外,我们可以为与、或和异或建立更复杂的逻辑运算符,这些运算符可以基于形式证明,并利用 NeSy 引擎在评估表达式之前解析数据结构。这使得可以定义定制操作,以执行复杂且强大的逻辑操作,结合约束以验证结果并引导计算朝向期望的行为。

E.7 CUSTOM OPERATIONS
E.7 自定义操作

The following example demonstrates how to define a custom == operation by overriding the _____eq_____ method and providing a custom prompt object with a list of examples:
以下示例演示了如何通过重写 _____eq_____ 方法并提供一个带有示例列表的自定义提示对象来定义自定义 == 操作:

import symai
class Demo (symai.Symbol):
def _____eq_____(self, other) -> bool:
# define nested function
@symai.core.equals(examples=symai.Prompt([
"1 == 'ONE' =>True",
"'six' == 7 =>False",
"'Acht' == 'eight' =>True",
. .
def _func(_, other) -> bool: # [optional] cast return type
return False # [optional] default behavior on failure (2. below)
return _func(self, other)

As illustrated in the example, this is also the method we used to implement basic operations in Symbol, namely by defining local functions that are then decorated with the respective operation decorator from the symai. core.py file. The symai. core.py is a collection of pre-defined operation decorators that can be quickly applied to any function. We use locally defined functions instead of directly decorating the main methods for two reasons
如示例所示,这也是我们用于在 Symbol 中实现基本操作的方法,即通过定义本地函数,然后用 symai.core.py 文件中的相应操作装饰器进行装饰。symai.core.py 是一组预定义的操作装饰器,可以快速应用于任何函数。我们使用本地定义的函数而不是直接装饰主方法有两个原因:
  1. We want to cast return types of the operation outcome to symbols or other derived classes thereof.
  1. 我们希望将操作结果的返回类型转换为符号或其派生类。
  1. We do not necessarily want all of our operations to be sent to the NeSy engine and might need to implement a default behavior.
  1. 我们不一定希望将所有操作发送到 NeSy 引擎,可能需要实现默认行为。
This is achieved through the -sym_return-type method, which can provide contextualized behavior based on the defined return type. More details can be found in the actual Symbol class.
这是通过 -sym_return-type 方法实现的,该方法可以根据定义的返回类型提供上下文化的行为。更多细节可以在实际的 Symbol 类中找到。
In the context of LLMs, zero- and few-shot learning domains have emerged as essential techniques (Yao et al., 2023b; Shinn et al., 2023; Kim et al., 2023; Wei et al., 2022b; Lyu et al., 2023; Pitis et al., 2023; Madaan et al., 2023; Wang et al.,2022; Ye et al.,2023) 7 to enable models to generalize from limited training data and adapt to new tasks without as it reduces the need for large labeled data sets and allows for faster deployment in new applications. In this section, we demonstrate how our Symbolic API incorporates Python decorators to define custom operations in the zero- and few-shot domains.
在大型语言模型(LLMs)的背景下,零样本和少样本学习领域已成为重要技术(Yao et al., 2023b; Shinn et al., 2023; Kim et al., 2023; Wei et al., 2022b; Lyu et al., 2023; Pitis et al., 2023; Madaan et al., 2023; Wang et al., 2022; Ye et al., 2023)7,使模型能够从有限的训练数据中进行概括,并在不需要大量标记数据集的情况下适应新任务,从而加快在新应用中的部署。在本节中,我们展示了我们的符号 API 如何结合 Python 装饰器在零样本和少样本领域定义自定义操作。
Consider the following example, where we define a custom operation to generate a random integer between 0 and 10 using the Symbolic API and Python decorators:
考虑以下示例,我们使用符号 API 和 Python 装饰器定义一个自定义操作,以生成 0 到 10 之间的随机整数:

import symai
class Demo (symai.Symbol):
def _____init_____(self, value = '') -> None:
super( )._____init_____(value)
@symai.core.zero_shot (prompt="Generate a random integer between 0 and 10.",
constraints=[
lambda x:x>=0 ,
lambda x: x <= 10
1)
def get_random_int(self) -> int:
pass

In this example, the @symai . core. zero_shot decorator is used to define a custom operation that does not require any examples, as the prompt is expressive enough. The zero-shot decorator takes in two arguments: prompt and constraints. The prompt defines the conditioning for our desired operation behavior, while the constraints are used to validate the computed outcome, ensuring it meets our expectations. If the constraints are not fulfilled, the implementation would resort to the specified default implementation or the default value. If neither is provided, the The resulting value from the wrapped function must be of type int because of the specific implementation of the auto-casting realized through . If the cast fails,the Symbolic API raises a ValueError. If no return type is specified, the return type defaults to Any.
在这个示例中,@symai.core.zero_shot 装饰器用于定义一个不需要任何示例的自定义操作,因为提示足够表达。零样本装饰器接受两个参数:提示和约束。提示定义了我们期望的操作行为的条件,而约束用于验证计算结果,确保其符合我们的期望。如果未满足约束,实施将退回到指定的默认实现或默认值。如果两者都未提供,包装函数的结果值必须是 int 类型,因为通过 实现的自动转换的具体实现。如果转换失败,符号 API 将引发 ValueError。如果未指定返回类型,返回类型默认为 Any。
The @symai.core.few_shot decorator is a generalized version of @symai.core.zero_shot and is used to define custom operations requiring examples. The function signature of the few_shot decorator is as follows:
@symai.core.few_shot 装饰器是 @symai.core.zero_shot 的一种广义版本,用于定义需要示例的自定义操作。few_shot 装饰器的函数签名如下:

def few_shot(prompt: str,
examples: Prompt,
constraints: List[Callable] =[] ,
default: Any = None,
limit: int =1 ,
pre_processor: Optional [List [PreProcessor]] = None,
post_processor: Optional [List [PostProcessor]] = None,
wrp_kwargs):

The behavior of the prompt and constraints attributes is similar to the zero_shot decorator. The examples and limit arguments are new, with examples defining a list of instructions conditioning the NeSy engine, and limit specifying the maximum number of examples returned. The pre-processor and post-processor arguments accept lists of PreProcessor and Post Processor objects, respectively, which are utilized to process the input before being fed into the NeSy engine and the output before being returned to the user. The wrp_kwargs argument passes additional arguments to the wrapped method, streamlining them towards the NeSy engine, or other engines.
prompt 和 constraints 属性的行为与 zero_shot 装饰器相似。examples 和 limit 参数是新的,其中 examples 定义了条件化 NeSy 引擎的一系列指令,而 limit 指定返回的示例的最大数量。pre-processor 和 post-processor 参数分别接受 PreProcessor 和 Post Processor 对象的列表,这些对象用于在输入被送入 NeSy 引擎之前处理输入,以及在输出返回给用户之前处理输出。wrp_kwargs 参数将额外的参数传递给被包装的方法,从而简化它们向 NeSy 引擎或其他引擎的传递。

E.8 PROMPTING
E.8 提示

In this section, we discuss the design of prompts and their role in shaping the behavior of operations. Prompts serve as containers for information necessary to define specific operations, and the Prompt class serves as the base class for
在本节中,我们讨论提示的设计及其在塑造操作行为中的作用。提示作为定义特定操作所需信息的容器,而 Prompt 类作为所有其他 Prompt 类的基类。

7 This is by no means an exhaustive list,we only point the reader to some very influential and recent research.
7 这绝不是一个详尽的列表,我们仅向读者指出一些非常有影响力和近期的研究。

all the other Prompt classes. Consider the following example, where we define a Prompt for comparing two values through the NeSy engine. In it,when the <= operation on two Symbol objects will be resolved,the NeSy engine evaluates them in the context of the CompareValues prompt.
考虑以下示例,我们通过 NeSy 引擎定义一个用于比较两个值的提示。在这个示例中,当对两个 Symbol 对象的 <= 操作被解析时,NeSy 引擎在 CompareValues 提示的上下文中评估它们。

def init_(self) -> symai.Prompt:
super( )._____init_____([
"4 > 88 =>False",
"-inf <0 True",
"inf >0=> True",
" 4>3=> True",
"1 < 'four' =>True",
...
1)

Evaluating a fuzzy comparison statement:
评估模糊比较语句:

1 res = symai. Symbol (1)<= symai. Symbol (one’) Output of the evaluation: 1 : [Output] : 2 True
1 res = symai. Symbol (1)<= symai. Symbol (one’) 评估结果:1 : [Output] : 2 True
This evaluation returns True, as the fuzzy comparison operation conditions the engine to compare the two Symbol objects based on their semantic meaning. More generally, the semantics of Symbol operations may vary depending on the context hierarchy of the Expression class and the operations used. We used three main prompt designs: Context-based Prompts, Operational Prompts, and Templates. Prompts can be curated either through inheritance or composition. For instance, the static context can be defined by inheriting from the Expression class and overwriting the static_context property, while an Operation and Template prompt can be created by providing a PreProcessor to modify the input data.
此评估返回 True,因为模糊比较操作使引擎能够根据两个 Symbol 对象的语义意义进行比较。更一般地说,Symbol 操作的语义可能会根据 Expression 类的上下文层次结构和所使用的操作而有所不同。我们使用了三种主要的提示设计:基于上下文的提示、操作提示和模板提示。提示可以通过继承或组合进行策划。例如,静态上下文可以通过继承 Expression 类并重写 static_context 属性来定义,而操作和模板提示可以通过提供 PreProcessor 来修改输入数据。
We will now provide a more detailed explanation for each prompt design:
现在我们将为每种提示设计提供更详细的解释:
  1. Context-based Prompts are considered optional and can be defined in a static manner, either by sub-classing the Expression class and overwriting the static-context property, or at runtime by updating the dynamic_context property or passing a payload kwargs to a method. Below is an example of payload kwargs through the method signature:
  1. 基于上下文的提示被视为可选的,可以以静态方式定义,方法是通过子类化 Expression 类并重写 static-context 属性,或在运行时通过更新 dynamic_context 属性或将有效负载 kwargs 传递给方法。以下是通过方法签名传递有效负载 kwargs 的示例:

i # creating a query to ask if an issue was resolve or not
2 s = symai.Symbol("")
3 q = s.query("Was the issue resolved?")
# write manual condition to check if the issue was resolved
if 'not resolved' in q:
# do a new query but payload the previous query answer to the new query
s.query("What was the resolution?", payload=q)
. . .
else:
pass # all good

Regardless of how the context is set, the contextualized prompt defines the desired behavior of Expression operations. For instance, if we want to operate in the context of a DSL without having to overwrite each base class method,we can utilize this approach 8 .
无论上下文如何设置,上下文化的提示定义了 Expression 操作的期望行为。例如,如果我们希望在 DSL 的上下文中操作,而不必重写每个基类方法,我们可以利用这种方法 8
  1. Operational Prompts define the behavior of an atomic operation and are therefore mandatory to express the nature of such an operation. For example, the + operation is used to add two Symbol objects together, and the + operation prompt explains its behavior. The examples kwargs provide another optional structure that conditions the NeSy engine with a set of instructions.
  1. 操作提示定义了原子操作的行为,因此对于表达此类操作的性质是必需的。例如,+ 操作用于将两个 Symbol 对象相加,而 + 操作提示解释了它的行为。示例 kwargs 提供了另一种可选结构,为 NeSy 引擎提供了一组指令。
  1. Template Prompts are optional and encapsulate the resulting prediction to enforce a specific format. For example,to generate HTML tags,we can utilize a curated ... template. This template enforces the NeSy engine to begin the generation process already in the context of an HTML tags format and not produce irrelevant descriptions about its task.
  1. 模板提示是可选的,并封装了结果预测以强制特定格式。例如,为了生成 HTML 标签,我们可以利用策划的 ... 模板。该模板强制 NeSy 引擎在 HTML 标签格式的上下文中开始生成过程,而不是产生与其任务无关的描述。

8 See more details in this notebook.
8 请在此笔记本中查看更多详细信息。

E.9 COMPLEX EXPRESSIONS
E.9 复杂表达式

We will now attempt to obtain logical answers based on questions of the kind:
现在我们将尝试基于以下类型的问题获得逻辑答案:
  • A line parallel to y=4x+6 passes through(5,10). What is the y -coordinate of the intercept?
  • 一条与 y=4x+6 平行的直线通过点(5,10)。截距的 y -坐标是多少?
  • Bob has two sons, John and Jay. Jay has one brother and father. The father has two sons. Jay's brother has a brother and a father. Who is Jay's brother?
  • 鲍勃有两个儿子,约翰和杰伊。杰伊有一个兄弟和父亲。父亲有两个儿子。杰伊的兄弟有一个兄弟和一个父亲。杰伊的兄弟是谁?
  • Is 1000 bigger than 1063.472?
  • 1000 是否大于 1063.472?
To solve these tasks, we would initially employ a series of operations to identify the most suitable engine for handling the specific requirements. Subsequently, we would prepare the input tailored to the selected engine.
为了解决这些任务,我们最初将采用一系列操作来识别最适合处理特定需求的引擎。随后,我们将准备针对所选引擎量身定制的输入。

i val = ""
3 # First define a class that inherits from the \\texttt{Expression} class
class ComplexExpression(symai.Expression):
# write a method that returns the causal evaluatior
def causal_expression(self):
pass
) # instantiate an object of the class
10 expr = ComplexExpression(val)
# set WolframAlpha as the main expression engine to use
12 expr.command(engines=['symbolic'], expression_engine='wolframalpha')
13 # evaluate the expression
14 res = expr.causal_expression( )

The implementation of causal_expression could in principle look like this: def causal_expression (self): if self.isinstanceof ('mathematics'): # get the mathematical formula formula = self.extract ('mathematical formula') # verify which problem type we have if formula.isinstanceof ('linear function'): # prepare for wolframalpha question = self.extract('question sentence') # get coordinate point / could also ask for other points x= self.extract('coordinate point (.,.) ) # concatenate to the question and formula query = formula | f′, point x = {x}′ | f′, solve {req}′ elif formula.isinstanceof ('number comparison'): res = formula.expression( ) # send directly to wolframalpha ... # more cases elif self.isinstanceof ('graph construction'): sentences = self / '.' # first split into sentences graph = {} # define graph sym = symai.Symbol(s) relations = sym.extract( # extract and split by pipe 'connected entities (e.g. A has three B => A | A: three B)') / '|' ... # add relations and populate graph; reading suggestion ... # more cases return res
因果表达式的实现原则上可以这样进行:def causal_expression (self): if self.isinstanceof ('mathematics'): # 获取数学公式 formula = self.extract ('mathematical formula') # 验证我们拥有哪种问题类型 if formula.isinstanceof ('linear function'): # 准备 wolframalpha 的问题 = self.extract('question sentence') # 获取坐标点 / 也可以询问其他点 x= self.extract('coordinate point (.,.) ) # 将问题和公式连接起来,查询 = formula | f′, point x = {x}′ | f′, solve {req}′ elif formula.isinstanceof ('number comparison'): res = formula.expression( ) # 直接发送到 wolframalpha ... # 更多情况 elif self.isinstanceof ('graph construction'): sentences = self / '.' # 首先分割成句子 graph = {} # 定义图形 sym = symai.Symbol(s) relations = sym.extract( # 提取并按管道分割 'connected entities (e.g. A has three B => A | A: three B)') / '|' ... # 添加关系并填充图形;阅读建议 ... # 更多情况 return res
The aforementioned example demonstrates the utilization of the causal-expression method, which allows us to extract information that can be resolved either manually or through external solvers, say WolframAlpha. Initially, when utilizing the GPT-3 backend, we anticipated a significant engineering effort to develop such a complex expression, as the GPT-3 backend frequently struggled with accurate information extraction and comparison resolution. However, we remained confident in the field's progress, specifically with fine-tuned models like RLHF ChatGPT. We were delighted to witness these challenges being further tackled through the latest GPT-4 model (OpenAI, 2023).
上述示例展示了因果表达法的应用,该方法允许我们提取可以通过手动或外部求解器(例如 WolframAlpha)解决的信息。最初,当使用 GPT-3 后端时,我们预计需要进行大量工程工作来开发如此复杂的表达式,因为 GPT-3 后端在准确的信息提取和比较解决方面经常遇到困难。然而,我们对该领域的进展保持信心,特别是针对像 RLHF ChatGPT 这样的微调模型。我们很高兴看到这些挑战通过最新的 GPT-4 模型(OpenAI, 2023)得到了进一步的解决。
Furthermore, it is worth highlighting that, given sufficient data, we could refine methods for information extraction or knowledge graph construction from natural language, enabling more intricate reasoning tasks, such as those previously mentioned. We also direct readers to recent publications on Text-to-Graph translations, especially the very influential CycleGT (Guo et al., 2020). This approach allows us to answer queries by simply traversing the graph and extracting the required information.
此外,值得强调的是,考虑到足够的数据,我们可以改进从自然语言中提取信息或构建知识图谱的方法,从而实现更复杂的推理任务,例如前面提到的那些任务。我们还建议读者参考最近关于文本到图翻译的出版物,特别是非常有影响力的 CycleGT(Guo 等,2020)。这种方法使我们能够通过简单地遍历图形并提取所需信息来回答查询。
Lastly, recent research (Kıcman et al., 2023; Ellis, 2023) has demonstrated that algorithms based on GPT-3.5 and GPT-4 establish new state-of-the-art accuracy on multiple causal benchmarks, while also exhibiting unique capabilities previously considered exclusive to humans, such as generating causal graphs and identifying background causal context from natural language. This points to the potential for LLMs to be used alongside existing causal methods as proxies for human domain knowledge, reducing human effort in setting up causal analyses and ultimately accelerating the widespread adoption of causal methods. Moreover, recent advances in LLMs have opened new frontiers for research, practice, and adoption of causal reasoning, transforming the way causal analysis is conducted and broadening the scope of applications for our framework.
最后,最近的研究(Kıcman 等,2023;Ellis,2023)表明,基于 GPT-3.5 和 GPT-4 的算法在多个因果基准上建立了新的最先进的准确性,同时还表现出以前被认为仅限于人类的独特能力,例如生成因果图和从自然语言中识别背景因果上下文。这表明大型语言模型(LLMs)有潜力与现有因果方法结合使用,作为人类领域知识的代理,减少人类在设置因果分析中的努力,并最终加速因果方法的广泛采用。此外,LLMs 的最新进展为因果推理的研究、实践和采用开辟了新的前沿,改变了因果分析的进行方式,并扩大了我们框架的应用范围。
One of the most prominent illustrations of this concept is exhibited by Word2Vec (Mikolov et al., 2013a). Word2Vec generates dense representations of words by training a shallow neural network to predict a word based on its neighboring words within a text corpus. These resulting vectors are extensively utilized in various natural language processing applications, including sentiment analysis, text classification, and clustering.
这一概念最显著的例子之一是 Word2Vec(Mikolov 等,2013a)。Word2Vec 通过训练一个浅层神经网络来预测一个词基于其在文本语料库中的邻近词,从而生成词的稠密表示。这些生成的向量广泛应用于各种自然语言处理应用中,包括情感分析、文本分类和聚类。
Drawing parallels with Word2Vec, our objective is to execute contextualized operations on different symbols. However, the key distinction lies in the fact that we operate within the natural language domain, as opposed to a vector space. Consequently, this grants us the capability to conduct arithmetic on words, sentences, paragraphs, and the like, while simultaneously validating the outcomes in a human-readable format.
与 Word2Vec 相比较,我们的目标是在不同符号上执行上下文化操作。然而,关键的区别在于我们在自然语言领域内操作,而不是在向量空间中。因此,这使我们能够对单词、句子、段落等进行算术运算,同时验证结果以人类可读的格式呈现。
The following example, we illustrate the methodology for evaluating such an expression through a string representation:
在以下示例中,我们展示了通过字符串表示来评估此类表达式的方法:
s = symai.Symbol ('King - Man + Woman')
s = symai.Symbol ('国王 - 男人 + 女人')
2 s.expression ( )

1 : [Output] :
2 (value=Queen)

In contrast to the Symbol object, the Expression represents a non-terminal symbol. It allows for further evaluation and extends the Symbol class by overwriting the ....call_____ method. It serves as the foundation for all other expressions and possesses additional capabilities, namely to fetch data from URLs, search the internet, or open files. These operations are intentionally separated from Symbol, as they do not utilize the value attribute of the Symbol class.
与 Symbol 对象相比,Expression 代表一个非终结符号。它允许进一步评估,并通过重写 ....call_____ 方法扩展 Symbol 类。它作为所有其他表达式的基础,并具有额外的功能,即从 URL 获取数据、搜索互联网或打开文件。这些操作与 Symbol 有意分开,因为它们不使用 Symbol 类的值属性。

E.10 COMPOSITION
E.10 组合

E. 11 SEQUENCES
E. 11 序列
Sequences offer a multitude of advantages in the realm of Expression objects, as they facilitate the creation of more sophisticated structural configurations. By embodying the Sequence expression, multiple expressions can be effectively evaluated at runtime, thus enhancing the flexibility, modularity, and adaptability of the framework.
序列在表达式对象领域提供了众多优势,因为它们促进了更复杂结构配置的创建。通过体现序列表达式,可以在运行时有效评估多个表达式,从而增强框架的灵活性、模块化和适应性。

i # first import all expressions
2 from symai.components import *
3 # define a sequence of expressions
Sequence ( )
Clean( ),
Translate ( ),
Outline ( ),
Compose ( ),

E.12 STREAMS
E.12 流

As demonstrated earlier, creating contextualized prompts refines the behavior of operations in the NeSy engine. However, this also consumes a considerable portion of the available context size. Given a limited context size, this constraint may pose challenges. Fortunately, the Stream processing expression offers a solution by opening a data stream and performing chunk-based operations on the input stream. Stream expressions can encapsulate other expressions. For instance, chunks can be managed through a Sequence expression, which permits multiple compositional operations sequentially. The example below illustrates the definition of a Stream expression: Stream ( Sequence ( Clean( ), Translate ( ), Outline ( ), Embed ( )
如前所示,创建上下文化提示可以精炼 NeSy 引擎中操作的行为。然而,这也消耗了可用上下文大小的相当一部分。考虑到有限的上下文大小,这一限制可能会带来挑战。幸运的是,流处理表达式通过打开数据流并对输入流执行基于块的操作提供了解决方案。流表达式可以封装其他表达式。例如,可以通过序列表达式管理块,这允许多个组合操作顺序进行。以下示例展示了流表达式的定义:Stream ( Sequence ( Clean( ), Translate ( ), Outline ( ), Embed ( )
)
In this case, a stream is opened and a Sequence expression is passed, which cleans, translates, outlines, and embeds the input. Internally, the stream operation estimates the available model context size and segments the lengthy input text into smaller chunks transmitted to the inner expression. The returned object type is a generator.
在这种情况下,打开一个流并传递一个序列表达式,该表达式清理、翻译、概述并嵌入输入。内部,流操作估计可用的模型上下文大小,并将冗长的输入文本分割成较小的块,传输到内部表达式。返回的对象类型是生成器。
The limitation of this approach is that the resulting chunks are processed independently, lacking shared context or information among them. To address this, the Cluster expression can be employed, merging the independent chunks based on their similarity, as it illustrated in Figure 12. 9 s = symai. Symbol ('') 10 res = symai.Symbol(list(stream(s))) 11 expr = Cluster( ) 12 expr (res)
这种方法的局限性在于,结果块是独立处理的,缺乏共享的上下文或信息。为了解决这个问题,可以使用聚类表达式,根据它们的相似性合并独立块,如图12所示。 9 s = symai. Symbol ('') 10 res = symai.Symbol(list(stream(s))) 11 expr = Cluster( ) 12 expr (res)
Figure 12: Stream processing expression in NeSy engine, illustrating data stream segmentation into chunks, each undergoing operations like cleaning, outlining, and embedding. The Cluster expression then merges chunks based on similarity, allowing contextually related information to be consolidated meaningfully. Node summaries are generated by extracting key labels from each cluster's content, overcoming context size limitations and maintaining shared information among processed chunks.
图12:NeSy引擎中的流处理表达式,说明数据流分段为块,每个块经历清理、概述和嵌入等操作。然后,聚类表达式根据相似性合并块,从而使上下文相关的信息能够有意义地整合。通过从每个聚类的内容中提取关键标签生成节点摘要,克服上下文大小的限制,并在处理的块之间保持共享信息。
By merging individual chunks by clustering their contents, contextually related information can be consolidated in a meaningful manner. Additionally, the clustered information can be labeled by streaming through each cluster's content and extracting the most pertinent labels, yielding interpretable node summaries.
通过聚类内容合并各个块,可以以有意义的方式整合上下文相关的信息。此外,可以通过流式处理每个聚类的内容并提取最相关的标签来标记聚类的信息,从而生成可解释的节点摘要。
The complete example is depicted as follows:
完整示例如下所示:
stream = Stream ( )
Sequence ( )
Clean ( ),
Translate ( ),
Outline ( ),
)
Subsequently, this process can be recursively repeated on each summary node to construct a hierarchical clustering structure. As each node represents a summarized subset of the original information, the summary can function as an index. The resulting tree can be utilized to navigate and retrieve the original information, transforming the large data stream problem into a search problem. Alternatively, vector-based similarity searches can be employed to identify similar nodes. For searching within a vector space, dedicated libraries such as Annoy (Spotify, 2017), Faiss (Johnson et al., 2019), or Milvus (Wang et al., 2021a) can be used.
随后,这个过程可以在每个摘要节点上递归地重复,以构建层次聚类结构。由于每个节点代表原始信息的摘要子集,因此摘要可以作为索引。生成的树可以用于导航和检索原始信息,将大数据流问题转化为搜索问题。或者,可以使用基于向量的相似性搜索来识别相似节点。对于在向量空间内的搜索,可以使用专门的库,如 Annoy(Spotify,2017)、Faiss(Johnson 等,2019)或 Milvus(Wang 等,2021a)。
In summary, Stream expressions offer the advantage of processing large data streams in a more efficient and organized manner, while also enabling the integration with other expressions like Sequence and Cluster expressions. These combinations allow for a more effective approach to handling context limitations, promoting the extraction of meaningful information and improving the overall performance of the framework.
总之,流表达式提供了以更高效和有组织的方式处理大数据流的优势,同时还允许与其他表达式(如序列和聚类表达式)集成。这些组合使得处理上下文限制的方式更加有效,促进了有意义信息的提取,并提高了框架的整体性能。

E.13 ERROR HANDLING, DEBUGGING, AND EXPLAINABILITY
E.13 错误处理、调试和可解释性

Effective error handling and debugging are essential for ensuring the robustness and reliability of any software system, while explainability is essential for understanding the underlying behavior of the system, especially in the context of AI-driven applications. By developing a system that is both transparent and interpretable, we can gain valuable insights into the performance of the NeSy engines and identify potential areas for improvement.
有效的错误处理和调试对于确保任何软件系统的稳健性和可靠性至关重要,而可解释性对于理解系统的基本行为尤其重要,特别是在人工智能驱动的应用程序中。通过开发一个既透明又可解释的系统,我们可以获得对 NeSy 引擎性能的宝贵见解,并识别潜在的改进领域。

E.14 ERROR HANDLING
E.14 错误处理

One of the fundamental aspects of the SymbolicAI API is being able to generate code. Consequently, errors may arise, and handling them contextually becomes vital. In pursuit of a self-evolving API, we introduce the Try expression, which includes built-in fallback statements and automatically retries execution after performing dedicated error analysis and correction. This expression analyzes both the input and the error, conditioning itself to resolve the error by modifying the original code 9 . If the fallback expression succeeds,the result is returned; otherwise,the process is repeated for the specified number of retries. If the maximum number of retries is reached without resolving the issue, the error is raised again.
SymbolicAI API 的一个基本方面是能够生成代码。因此,可能会出现错误,处理这些错误的上下文变得至关重要。为了追求自我演化的 API,我们引入了 Try 表达式,该表达式包含内置的回退语句,并在进行专门的错误分析和修正后自动重试执行。该表达式分析输入和错误,调整自身以通过修改原始代码 9 来解决错误。如果回退表达式成功,则返回结果;否则,过程将为指定的重试次数重复。如果达到最大重试次数仍未解决问题,则再次引发错误。
Consider the example of executing previously generated code that contains a syntax error. By employing the Execute expression, we can evaluate the generated code, which takes a symbol and proceeds with the execution. Despite the initial failure, the Try expression resolves the syntactic error, returning the corrected and evaluated code:
考虑执行包含语法错误的先前生成代码的示例。通过使用 Execute 表达式,我们可以评估生成的代码,该表达式接受一个符号并继续执行。尽管初始失败,Try 表达式解决了语法错误,返回修正和评估后的代码:
expr =Try(expr= Execute ())
2 s = symai.Symbol('a = int("3,")') # some code with a syntax error
2 s = symai.Symbol('a = int("3,")') # 一些包含语法错误的代码
3 expr (s)
1 : Output:
1 : 输出:
2a=3
While not all errors can be resolved as easily as the demonstrated syntax error, we continue to explore more sophisticated error handling mechanisms, including streams and clustering to address errors in a hierarchical and contextual manner.
虽然并非所有错误都能像演示的语法错误那样轻易解决,但我们继续探索更复杂的错误处理机制,包括流和聚类,以分层和上下文的方式处理错误。

F EVALUATION DETAILS
F 评估细节

In a computational graph, the VERTEX score compares the distribution of the generated model answer at each node against a reference distribution by sampling multiple valid trajectories at each node for the reference distribution and accounting for randomness through some predefined random trajectories. For instance, one of the predefined random trajectories in our benchmark was the string of ASCII characters which are considered printable, namely 0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$$&'( ) *+, - . / : ; <=>?@ [ ] ^_ { } \\ . Moreover, the VERTEX score is particularly well suited for the evaluation of multi-step
在计算图中,VERTEX 分数通过在每个节点对生成模型答案的分布与参考分布进行比较,来评估模型的表现。它通过在每个节点采样多个有效轨迹来获取参考分布,并通过一些预定义的随机轨迹来考虑随机性。例如,我们基准测试中的一个预定义随机轨迹是被认为可打印的 ASCII 字符串,即 0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$$&'( ) *+, - . / : ; <=>?@ [ ] ^_ { } \\ 。此外,VERTEX 分数特别适合于多步骤的评估。

9 This is similar to the recently released Auto-GPT application.
9 这与最近发布的 Auto-GPT 应用程序类似。

workflows and in contexts where the solution space is or is expected to be diverse. We will now proceed by describing in detail the tasks that we defined in our benchmark.
工作流以及解决方案空间多样化的上下文。我们将详细描述我们在基准测试中定义的任务。

F.1 ASSOCIATIVE PREDICTION
F.1 关联预测

We defined a total of 15 tasks involving in-context associations between two Symbol instances. SymbolicAl's overloaded operators rely on predefined pseudo-grammars, as described in Section 4, that augment the operators with few-shot examples. For instance, the overloaded operator + utilized between two Symbol instances provides few-shot examples how to resolve additions with various data types:
我们定义了总共 15 个任务,涉及两个符号实例之间的上下文关联。SymbolicAl 的重载运算符依赖于预定义的伪语法,如第 4 节所述,这些伪语法通过少量示例增强了运算符。例如,在两个符号实例之间使用的重载运算符 + 提供了如何使用不同数据类型解决加法的少量示例:

1 " 1 1 + 2 =>3 ",
2 "17 + 'pi' =>20.1415926535...",
"7.2 + 'five' =>12.2",
"True +0 False",
"False + 'True' =>False",
"['a', 'b'] + ['c', 'd'] =>['a', 'b', 'c', 'd']",
"['apple'] + 'banana' =>['apple', 'banana']",
" ' Zero' + 1 =>1",
" 'One' + 'Two' =>3",
0 " " Three ' + 4 =>7",
" ' a + b′ + ' c + d′ =>a + b + c + d",
12 . . .

Consequently, we can now test if the models can solve the addition between Symbol ("two hundred and thirty four") and Symbol (7000).
因此,我们现在可以测试模型是否能够解决符号(“二百三十四”)与符号(7000)之间的加法。

F.2 MULTI-MODAL BINDING
F.2 多模态绑定

We perform transformations between multiple modalities through language-based representations. Therefore, we need to evaluate the model's proficiency in tool utilization, classification and routing of requests to relevant modules. We define a multi-modal Expression to detect the category of a task based on its content and to forward the task to the appropriate tool. The expression creates interfaces to tools like WolframAlpha for mathematical expressions, Selenium for website content scraping, SerpApi for search queries, and APILayer for optical character recognition. Each of the five tests aims to evaluate the appropriate handling of a specific type of input by the multi-modal Expression type, such as processing a website URL for scraping, interpreting a search engine query, testing if two vectors are linearly independent, comparing large numbers, and extracting text from an image. The following example shows the MultiModalExpression implementation of the forward function that uses isinstance of operator on its own context to determine its current expression value and select the sub-routine that can evaluate the request.
我们通过基于语言的表示在多种模态之间进行转换。因此,我们需要评估模型在工具使用、请求分类和路由到相关模块方面的能力。我们定义了一个多模态表达式,以根据其内容检测任务的类别,并将任务转发给适当的工具。该表达式创建与工具的接口,例如 WolframAlpha 用于数学表达式,Selenium 用于网站内容抓取,SerpApi 用于搜索查询,以及 APILayer 用于光学字符识别。五个测试的每一个都旨在评估多模态表达式类型对特定类型输入的适当处理,例如处理用于抓取的网站 URL、解释搜索引擎查询、测试两个向量是否线性独立、比较大数字以及从图像中提取文本。以下示例展示了 MultiModalExpression 实现的 forward 函数,该函数在其自身上下文中使用 isinstance 运算符来确定当前表达式值并选择可以评估请求的子例程。

class MultiModalExpression (Expression):
def forward(self, ...):
formula = self.extract('mathematical formula')
if self.isinstanceof (LINEAR_ALGEBRA) :
...
res = self.solver (formula)
res = res.query('write a one sentence summary of the answer')
. . .
res = self.solver(formula) # send directly to wolframalpha
else:
. . .
uery = Symbol("is 100044347 bigger than 129981063.472?")
6 expr = MultiModalExpression (query)
17 res =expr()

F.3 PROGRAM SYNTHESIS
F.3 程序合成

We designed three separate tests related to program synthesis, where each task assesses the ability of the models to generate and execute code based on natural language instructions or provided templates: 2) The second task tests the automatic code generation for API calls by fetching data from a specified URL and extracting specific information from the retrieved content. 3) The third task evaluates the ability to construct a custom Expression that processes a Symbol through a specific Function component from the SymbolicAI package. Each of the three tests follows a similar pattern, where the generated code is scored based on its similarity to valid references and normalized with random samples. Figure 13 shows possible samples from the third task category.
我们设计了三个与程序合成相关的独立测试,每个任务评估模型根据自然语言指令或提供的模板生成和执行代码的能力:2)第二个任务测试通过从指定 URL 获取数据并从检索到的内容中提取特定信息来进行 API 调用的自动代码生成。3)第三个任务评估构建一个自定义表达式的能力,该表达式通过 SymbolicAI 包中的特定函数组件处理一个符号。三个测试遵循类似的模式,生成的代码根据其与有效参考的相似性进行评分,并与随机样本进行归一化。图 13 显示了第三个任务类别的可能样本。
1 import sympy as sym
2

5 expr = a * x + b * x - c * x - a * y - b * y + c * y + d
6 # validate with sympy
7 fact = sym.collect (expr, d, func=sym.factor)
8 # model based factorization
9 func = Factorization ('Factorize d from the expression such that your final start with:
’d +( ’: ’ )
11 # compare res with fact
12 . . .

2) Three tasks evaluate a models' capability to resolve the logical operations AND, OR, and XOR. For instance, the test for logical AND combines the symbols Symbol ("The horn only sounds on Sundays") and Symbol ("I hear the horn") and compares the answer against the human-generated references "The horn of possible solutions, there is high variability in the solution space. Each model might prefer a different solution.
2)三个任务评估模型解决逻辑运算 AND、OR 和 XOR 的能力。例如,逻辑 AND 的测试结合了符号 Symbol(“喇叭只在星期天响”)和符号 Symbol(“我听到喇叭声”),并将答案与人类生成的参考“可能解决方案的喇叭”进行比较,解决方案空间中存在高度变异性。每个模型可能偏好不同的解决方案。
The following snippet shows how one can define a custom primitive class (CustomLogicPrimitive) for logic operators. The _____or_____ function gets overloaded and uses the built-in logic decorator from the core package to create a local function that evaluates two Symbol instances.
以下代码片段展示了如何为逻辑运算符定义一个自定义原始类(CustomLogicPrimitive)。_____or_____ 函数被重载,并使用核心包中的内置逻辑装饰器创建一个局部函数,该函数评估两个符号实例。

from symai import core
from symai.ops.primitives import Primitive
class CustomLogicPrimitive (Primitive):
def _____or_____(self, other: Any) -> Any:
@core.logic (operator='or')
def _func(_, a: str, b: str):
pass # could impl. a fallback behavior here
return self._to_symbol(_func(self, other))
. . .
subject = 'cat'
es = (Symbol(f'The {subject} has whiskers.', primitives=CustomLogicPrimitive) | \\

Symbol (f'The {subject} has a tail.', primitives=CustomLogicPrimitive))
3) For another task we use a custom Expression that defines a DSL syntax and semantic structure. We use this Expression to extract higher-order logic expressions from a natural language statement, namely the puzzle 'Who is Jay’s brother?’ 10 ,that preserves the original relationships. The following is a DSL snippet of the ’Who is Jay’s brother?' puzzle:
3)在另一个任务中,我们使用一个自定义表达式,该表达式定义了 DSL 语法和语义结构。我们使用这个表达式从自然语言陈述中提取高阶逻辑表达式,即谜题“谁是杰伊的兄弟?” 10,该表达式保留了原始关系。以下是“谁是杰伊的兄弟?”谜题的 DSL 代码片段:

1 // Query
2 IsBrotherOf (jay, john, bob) <- BrotherOf (jay, john) AND FatherOf (bob, jay) AND
FatherOf (bob, john);
// Facts
5 BrotherOf (x,y)<HAS(x, brother )ANDHAS(y, brother )ANDSibling(x,y) ;
6 FatherOf (x,y)< HAS(x, son)AND ParentOf(x,y);
7 ParentOf(x, y) <= IS(x, parent) AND IS(y, child);
8 Sibling (x,y)< IS(x,father)AND IS(y,father)OR IS(x,mother)AND IS(y,mother);
10 . . .

4) For the final task, we again use the puzzle 'Who is Jay's brother?' to evaluate a models' capability for complex conversions. We use the Z3 theorem prover (Moura & Bjørner, 2008) to solve the 'Who is Jay's brother' puzzle conditioned on the Z3 solvers' solution to Einsteins' famous puzzle 'Who owns the fish?'. The task involves an indirect translation from natural language to executable code by the Z3 solver; the solution to Einstein's puzzle acts as a form of self-contained "documentation" for how the Z3 solver should be utilized. The test constructs a template, which includes the task instructions, puzzle statement, and reference to the Einstein's puzzle solution. The models are then asked to analyze the given problem and solution format and create a Python function with Z3 syntax that can solve the 'Who is Jay's brother?' puzzle. The dynamically generated code is executed within the test environment utilizing Python's exec function. We check the access to the Z3 solver and run the generated solve_puzzle
4) 对于最终任务,我们再次使用谜题“杰伊的兄弟是谁?”来评估模型进行复杂转换的能力。我们使用 Z3 定理证明器(Moura & Bjørner, 2008)来解决“杰伊的兄弟是谁”这一谜题,条件是 Z3 求解器对爱因斯坦著名谜题“谁拥有鱼?”的解答。该任务涉及从自然语言到可执行代码的间接翻译,由 Z3 求解器完成;爱因斯坦谜题的解答作为一种自包含的“文档”,说明 Z3 求解器应如何使用。测试构建了一个模板,其中包括任务说明、谜题陈述以及对爱因斯坦谜题解答的引用。然后要求模型分析给定的问题和解答格式,并创建一个具有 Z3 语法的 Python 函数,以解决“杰伊的兄弟是谁?”的谜题。动态生成的代码在测试环境中通过 Python 的 exec 函数执行。我们检查对 Z3 求解器的访问,并运行生成的 solve_puzzle。

10 Bob has two sons,John and Jay. Jay has one brother and father. The father has two sons. Jay’s brother has a brother and a father. Who is Jay's brother?
10 鲍勃有两个儿子,约翰和杰伊。杰伊有一个兄弟和父亲。父亲有两个儿子。杰伊的兄弟有一个兄弟和一个父亲。杰伊的兄弟是谁?

function supposed to contain the logic to solve the puzzle. Once executed, the assembled Z3 logical clauses are processed by the solver, which verifies that the set of constraints is satisfiable. If so, the model generated by the solver is queried for the puzzle's solution and scored using our VERTEX score. The following is an example output from the Z3 representation of the solution to 'Who is Jay's brother?' puzzle:
函数应包含解决谜题的逻辑。一旦执行,组装的 Z3 逻辑子句将由求解器处理,求解器验证约束集是否可满足。如果可满足,则查询求解器生成的模型以获取谜题的解答,并使用我们的 VERTEX 分数进行评分。以下是 Z3 对“杰伊的兄弟是谁?”谜题解答的表示示例输出:

from z3 import Solver, Bool, And, Not, Const, BoolSort, EnumSort, Function, IntSort
def solve_puzzle(S: Solver) -> Const:
# Define the enumeration sort for the individuals
Person, (BobE, JohnE, JaysBrotherE, FatherE) = EnumSort ('Person', ['Bob', '
John', 'Jay', 'JaysBrother', 'Father'])
# Define a function from boolean to persons (for brother status)
is_brother = Function('is_brother', Person, BoolSort( ))
# Define the relationships
. . .
return query

F.5 HIERARCHICAL COMPUTATIONAL GRAPHS
F.5 层次计算图

In this section we extend on the hierarchical computational graphs section.
在本节中,我们扩展层次计算图部分。
Research Paper Draft The following example defines a Paper expression that takes in a sequence of expressions which are sequentially executed. The Method expression contains a Source expression which points to the actual human-written method. The Method expression acts as the root node that bootstraps the generation process. The RelatedWork expression contains a sequence of Cite expressions which are executed in parallel and utilized to define the context of the related work section. The Abstract and Tit 1e expressions get executed last because they require all the previous information to be available in their respective contexts. Each expression in the sequence of linker object which is utilized after the execution to retrieve individual results from the nodes of the expression's computational graph. Each node was evaluated against its corresponding reference, all references representing actual sections from this research paper. The samples for each node were generated with a separate model (Claude 2) that was not part of this evaluation. In Figure 5 we show the resulting computational graph of the Paper expression.
研究论文草稿 以下示例定义了一个论文表达式,该表达式接受一系列表达式并依次执行。方法表达式包含一个源表达式,该表达式指向实际的人类编写的方法。方法表达式作为根节点,启动生成过程。相关工作表达式包含一系列并行执行的引用表达式,用于定义相关工作部分的上下文。摘要和标题表达式最后执行,因为它们需要所有先前信息在各自上下文中可用。链接对象序列中的每个表达式在执行后被利用,以从表达式计算图的节点中检索各个结果。每个节点都与其对应的参考进行了评估,所有参考代表本研究论文中的实际部分。每个节点的样本是使用一个独立的模型(Claude 2)生成的,该模型不属于此次评估。在图5中,我们展示了论文表达式的计算图结果。

# define the computational graph
expression = Paper ( )
Method ( )
# link to original code base where the main method is defined
Source(file_link='/path/to/.../file'),
),
# gather resources and write the related work
RelatedWork ( )
Cite (file_link='/path/to/.../file'),
Cite (file_link='/path/to/.../file'),
. . .
# write the abstract and title
Abstract ( ),
Title ( ),
paper_result = expression('Write a scientific paper')
19 # access linker to retreive the results from the method expression
20 method = expr.linker.find('Method')
21 # print result of the method expression
22 print (method)

F. 6 CAVEATS
F. 6 注意事项
One may have noticed that there are cases in which there is no need to sample multiple samples as there is only one expected answer, for instance, if we need to extract a specific number from a string and cast it to an integer. If the extraction process appends additional characters other than the number, the casting will fail. In such cases, the VERTEX score simply defaults to the chosen similarity measure, registering and penalizing any deviation from the expected answer.
可能有人注意到,在某些情况下,实际上不需要采样多个样本,因为只有一个预期答案,例如,如果我们需要从字符串中提取特定数字并将其转换为整数。如果提取过程附加了数字以外的其他字符,则转换将失败。在这种情况下,VERTEX 分数简单地默认为所选择的相似性度量,记录并惩罚任何偏离预期答案的情况。

Chapter 3 Conclusion and Outlook
第三章 结论与展望

The main part of this work involves the analysis and development of parameter choice and neuro-symbolic (NeSy) approaches for datasets with different characteristics. It results in the realization of state-of-the-art methods for unsupervised domain adaptation (UDA) (Zellinger et al., 2021; Dinu et al., 2023), a benchmark for large-scale empirical evaluations on several datasets from different domains (Dinu et al., 2023), and the introduction of a NeSy framework for domain-invariant learning without gradient-based updates (Dinu, 2022; Dinu et al., 2024a,b).
本工作的主要部分涉及对具有不同特征的数据集的参数选择和神经符号(NeSy)方法的分析和开发。最终实现了最先进的无监督领域适应(UDA)方法(Zellinger et al., 2021; Dinu et al., 2023),在来自不同领域的多个数据集上进行大规模实证评估的基准(Dinu et al., 2023),并引入了一个用于无梯度更新的领域不变学习的NeSy框架(Dinu, 2022; Dinu et al., 2024a,b)。
This work shows a connection between domain-invariant learning and in-context learning. It also introduced SymbolicAI, a logic-based framework that leverages large language models (LLMs) as semantic parsers to create complex computational graphs, combining symbolic expressions with probabilistic programming paradigms for concept learning and flow management in generative processes, and developed a novel quality measure and benchmark for evaluating these multi-step generative processes across various complex tasks. We therefore advance the integration between symbolic and sub-symbolic paradigms and open new avenues for NeSy architectures that can create in-context associations and behavioral adaptations among abstract representations.
本工作展示了领域不变学习与上下文学习之间的联系。它还介绍了SymbolicAI,一个基于逻辑的框架,利用大型语言模型(LLMs)作为语义解析器来创建复杂的计算图,将符号表达式与概率编程范式结合,用于概念学习和生成过程中的流管理,并开发了一种新颖的质量度量和基准,用于评估这些多步骤生成过程在各种复杂任务中的表现。因此,我们推动了符号与子符号范式之间的整合,并为能够在抽象表示之间创建上下文关联和行为适应的NeSy架构开辟了新的途径。

3.1 Future Work
3.1 未来工作

Our future work aims to build upon these foundations by extending our UDA benchmark to include additional domains and domain adaptation (DA) methods.
我们的未来工作旨在以这些基础为基础,扩展我们的UDA基准,以包括更多领域和领域适应(DA)方法。
The connection between in-context learning to domain-invariant learning supports researcher in approximation theory to perform analysis on the stability and generalizability of learning models across varied domains without retraining.
上下文学习与领域不变学习之间的联系支持研究人员在逼近理论中对学习模型在不同领域的稳定性和泛化能力进行分析,而无需重新训练。
In continuation of the development of our NeSy framework, we plan to explore agentic approaches based on LLMs and use quantitative measures for capturing the reasoning capabilities of such agents. By improving the interpretability and robustness of our framework, we aim to ensure its practical applicability in real-world scenarios for autonomous tasks. Furthermore, we see potential in differentiable computational graphs, synthetic data augmentation of workflows, and the use of the VERTEX score (Dinu et al., 2024a) as a reward signal for reinforcement learning optimization. We also plan to extend our NeSy expression language with more interfaces and workflows.
在我们 NeSy 框架发展的延续中,我们计划探索基于 LLM 的代理方法,并使用定量指标来捕捉此类代理的推理能力。通过提高我们框架的可解释性和鲁棒性,我们旨在确保其在现实场景中用于自主任务的实际适用性。此外,我们看到可微分计算图、工作流的合成数据增强以及使用 VERTEX 分数(Dinu et al., 2024a)作为强化学习优化的奖励信号的潜力。我们还计划扩展我们的 NeSy 表达语言,增加更多接口和工作流。
In summary, by grounding our work in DA and NeSy approaches, we aim to develop more general, adaptable, interpretable, and scalable AI systems, capable of addressing a wide array of real-world challenges.
总之,通过将我们的工作基于 DA 和 NeSy 方法,我们旨在开发更通用、可适应、可解释和可扩展的人工智能系统,能够应对各种现实世界的挑战。

Bibliography
参考文献

Anthropic. The Claude 3 Model Family: Opus, Sonnet, Haiku. Technical report, Anthropic, July 2024. URL https://www-cdn.anthropic.com/ de8ba9b01c9ab7cbabf5c33b80b7bbc618857627/Model_Card_Claude_3. pdf.
L. M. Augusto. Computational Logic. Vol. 1: Classical Deductive Computing with Classical Logic. College Publications, London, 2 edition, 2022.
S. Ben-David, J. Blitzer, K. Crammer, A. Kulesza, F. Pereira, and J. W. Vaughan. A theory of learning from different domains. In Machine Learning, volume 79, 2010.
T. R. Besold, A. S. d. Garcez, S. Bader, H. Bowman, P. Domingos, P. Hitzler, K.-U. Kuehnberger, L. C. Lamb, D. Lowd, P. M. V. Lima, L. de Penning, G. Pinkas, H. Poon, and G. Zaverucha. Neural-symbolic learning and reasoning: A survey and interpretation, 2017.
S. Black, G. Leo, P. Wang, C. Leahy, and S. Biderman. GPT-Neo: Large Scale Autoregressive Language Modeling with Mesh-Tensorflow. Technical report, Zenodo, Mar. 2021. URL https://doi.org/10.5281/zenodo.5297715.If you use this software, please cite it using these metadata.
T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, A. Nee-lakantan, P. Shyam, G. Sastry, A. Askell, S. Agarwal, A. Herbert-Voss, G. Krueger, T. Henighan, R. Child, A. Ramesh, D. Ziegler, J. Wu, C. Winter, C. Hesse, M. Chen, E. Sigler, M. Litwin, S. Gray, B. Chess, J. Clark, C. Berner, S. McCandlish, A. Radford, I. Sutskever, and D. Amodei. Language models are few-shot learners. In H. Larochelle, M. Ranzato, R. Hadsell, M. Balcan, and H. Lin, editors, Advances in Neural Information Processing Systems, volume 33, pages 1877-1901. Curran Associates, Inc., 2020.
N. Carlini, F. Tramer, E. Wallace, M. Jagielski, A. Herbert-Voss, K. Lee, A. Roberts, T. Brown, D. Song, U. Erlingsson, et al. Extracting training data from large language models. In 30th USENIX Security Symposium (USENIX Security 21), pages 2633-2650, 2021.
G. A. Carpenter and S. Grossberg. Art 2: self-organization of stable category recognition codes for analog input patterns. Appl. Opt., 26(23):4919-4930, Dec 1987. doi: 10.1364/AO.26.004919.
C. Chen, Z. Fu, Z. Chen, S. Jin, Z. Cheng, X. Jin, and X.-S. Hua. Homm: Higher-order moment matching for unsupervised domain adaptation. Association for the Advancement of Artificial Intelligence (AAAI), 2020.
N. Chomsky. On certain formal properties of grammars, pages 137-167. Open Journal of Modern Linguistics, 1959. ISBN 0019-9958.
M.-C. Dinu. SymbolicAI: A Neuro-Symbolic Perspective on Large Language Models (LLMs). Technical report, GitHub, 11 2022. URL https:// github.com/Xpitfire/symbolicai.
M.-C. Dinu*, M. Hofmarcher*, V. P. Patil, M. Dorfer, P. M. Blies, J. Brand-stetter, J. A. Arjona-Medina, and S. Hochreiter. Xai and strategy extraction via reward redistribution. In A. Holzinger, R. Goebel, R. Fong, T. Moon, K.-R. Müller, and W. Samek, editors, xxAI - Beyond Explainable AI: International Workshop, Held in Conjunction with ICML 2020, July 18, 2020, Vienna, Austria, Revised and Extended Papers, pages 177-205, Cham, 2022. Springer International Publishing. ISBN 978-3-031-04083-2. doi: 10.1007/978-3-031-04083-2_10.
M.-C. Dinu, M. Holzleitner, M. Beck, H. D. Nguyen, A. Huber, H. Eghbal-zadeh, B. A. Moser, S. V. Pereverzyev, S. Hochreiter, and W. Zellinger. Addressing parameter choice issues in unsupervised domain adaptation by aggregation. In The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net, 2023.
M.-C. Dinu, C. Leoveanu-Condrei, M. Holzleitner, W. Zellinger, and S. Hochreiter. Symbolicai: A framework for logic-based approaches combining generative models and solvers. In Conference on Lifelong Learning Agents, PMLR, 2024a.
M.-C. Dinu, C. Leoveanu-Condrei, M. Holzleitner, W. Zellinger, and S. Hochreiter. Symbolicai: A framework for logic-based approaches combining generative models and solvers. GenAI4DM Workshop at The Twelfth International Conference on Learning Representations, 2024b.
I. Donadello, L. Serafini, and A. S. d'Avila Garcez. Logic tensor networks for semantic image interpretation. In Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, IJCAI-17, pages 1596-1602, 2017.
A. Dosovitskiy, G. Ros, F. Codevilla, A. Lopez, and V. Koltun. CARLA: An open urban driving simulator. In Proceedings of the 1st Annual Conference on Robot Learning, pages 1-16, 2017.
M. Duan, A. Suri, N. Mireshghallah, S. Min, W. Shi, L. Zettlemoyer, Y. Tsvetkov, Y. Choi, D. Evans, and H. Hajishirzi. Do membership inference attacks work on large language models? arXiv preprint arXiv:2402.07841, 2024.
B. Ehret, C. Henning, M. R. Cervera, A. Meulemans, J. von Oswald, and B. F. Grewe. Continual learning in recurrent neural networks with hypernetworks. arXiv preprint arXiv:2006.12109, 2020.
K. Ellis. Human-like few-shot learning via bayesian reasoning over natural language. arXiv preprint arXiv:2306.02797, 2023.
M. Fang, S. Deng, Y. Zhang, Z. Shi, L. Chen, M. Pechenizkiy, and J. Wang. Large language models are neurosymbolic reasoners. arXiv preprint arXiv:2401.09334, 2024.
Y. Ganin, E. Ustinova, H. Ajakan, P. Germain, H. Larochelle, F. Laviolette, M. Marchand, and V. Lempitsky. Domain-adversarial training of neural networks. The journal of machine learning research, 17(1):2096-2030, 2016.
A. S. d. Garcez and L. C. Lamb. Neurosymbolic ai: The 3rd wave. arXiv preprint arXiv:2012.05876, 2020.
A. S. d. Garcez, L. C. Lamb, and D. M. Gabbay. Neural-Symbolic Cognitive Reasoning. Springer Publishing Company, Incorporated, 1 edition, 2008.
A. S. d. Garcez, T. Besold, L. D. Raedt, P. Földiák, P. Hitzler, T. Icard, K. Kühnberger, L. Lamb, R. Miikkulainen, and D. Silver. Neural-symbolic learning and reasoning: Contributions and challenges. In AAAI Conference, 2015.
A. S. d. Garcez, M. Gori, L. C. Lamb, L. Serafini, M. Spranger, and S. N. Tran. Neural-symbolic computing: An effective methodology for principled integration of machine learning and reasoning. Journal of Applied Logic, 2019.
Google. Gemini: A family of highly capable multimodal models. arXiv preprint arXiv:2312.11805, 2023.
A. Gretton, K. M. Borgwardt, M. J. Rasch, B. Schölkopf, and A. Smola. A kernel two-sample test. Journal of Machine Learning Research, 13(25): 723-773, 2012.
K. Hamilton, A. Nayak, B. Bož ić, and L. Longo. Is neuro-symbolic AI meeting its promises in natural language processing? a structured review. Semantic Web, pages 1-42, nov 2022. doi: 10.3233/sw-223228.
V. Hartmann, A. Suri, V. Bindschaedler, D. Evans, S. Tople, and R. West. Sok: Memorization in general-purpose large language models. arXiv preprint arXiv:2310.18362, 2023.
D. G. Hays. Chomsky Hierarchy, page 210-211. John Wiley and Sons Ltd., GBR, 2003. ISBN 0470864125.
D. O. Hebb. The organization of behavior: A neuropsychological theory. Wiley, New York, 1949. ISBN 0-8058-4300-0.
M. T. Hicks, J. Humphries, and J. Slater. Chatgpt is bullshit. Ethics and Information Technology, 26(2):38, 2024. ISSN 1572-8439. doi: 10.1007/ s10676-024-09775-5.
S. Hochreiter. Toward a broad AI. Commun. ACM, 65(4):56-57, mar 2022. ISSN 0001-0782.
M. Holzleitner, J. A. Arjona-Medina, M.-C. Dinu, A. Vall, L. Gruber, and S. Hochreiter. A two time-scale update rule ensuring convergence of episodic reinforcement learning algorithms at the example of rudder. NeurIPS Optimization Foundations for Reinforcement Learning Workshop, 2019.
W. A. Howard. The formulae-as-types notion of construction. In J. P. Seldin and J. R. Hindley, editors, To H. B. Curry: Essays on Combinatory Logic, Lambda Calculus and Formalism, pages 479-490. Academic Press, 1980. Original paper manuscript from 1969.
Z. Hu, X. Ma, Z. Liu, E. Hovy, and E. Xing. Harnessing deep neural networks with logic rules. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2410-2420, Berlin, Germany, August 2016. Association for Computational Linguistics.
J. Huang, X. Chen, S. Mishra, H. S. Zheng, A. W. Yu, X. Song, and D. Zhou. Large language models cannot self-correct reasoning yet. arXiv preprint arXiv:2310.01798, 2024.
L. Huang, W. Yu, W. Ma, W. Zhong, Z. Feng, H. Wang, Q. Chen, W. Peng, X. Feng, B. Qin, and T. Liu. A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions. arXiv preprint arXiv:2311.05232, 2023.
W. Huang, P. Abbeel, D. Pathak, and I. Mordatch. Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents. CoRR, abs/2201.07207, 2022. arXiv: 2201.07207.
A. Q. Jiang, A. Sablayrolles, A. Mensch, C. Bamford, D. S. Chaplot, D. de las Casas, F. Bressand, G. Lengyel, G. Lample, L. Saulnier, L. R. Lavaud, M.-A. Lachaux, P. Stock, T. L. Scao, T. Lavril, T. Wang, T. Lacroix, and W. E. Sayed. Mistral 7b. arXiv preprint arXiv:2310.06825, 2023.
T. Kaufmann, P. Weng, V. Bengs, and E. Hüllermeier. A survey of reinforcement learning from human feedback. arXiv preprint arXiv:2312.14925, 2024.
H. Kim, G. Papamakarios, and A. Mnih. The lipschitz constant of self-attention. In M. Meila and T. Zhang, editors, Proceedings of the 38th International Conference on Machine Learning, volume 139 of Proceedings of Machine Learning Research, pages 5562-5571. PMLR, 18-24 Jul 2021.
T. Kojima, S. S. Gu, M. Reid, Y. Matsuo, and Y. Iwasawa. Large language models are zero-shot reasoners. In A. H. Oh, A. Agarwal, D. Belgrave, and K. Cho, editors, Advances in Neural Information Processing Systems, 2022.
L. C. Lamb, A. S. d. Garcez, M. Gori, M. Prates, P. Avelar, and M. Vardi. Graph neural networks meet neural-symbolic computing: A survey and perspective. In AAAI Conference, 2020.
J. Li, E. Chen, Z. Ding, L. Zhu, K. Lu, and H. T. Shen. Maximum density divergence for domain adaptation. CoRR, abs/2004.12615, 2020.
Q. Liu and H. Xue. Adversarial spectral kernel matching for unsupervised time series domain adaptation. Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), 30, 2021.
X. Liu, C. Yoo, F. Xing, H. Oh, G. E. Fakhri, J.-W. Kang, J. Woo, et al. Deep unsupervised domain adaptation: A review of recent advances and perspectives. APSIPA Transactions on Signal and Information Processing, 11(1), 2022.
R. Manhaeve, S. Dumancic, A. Kimmig, T. Demeester, and L. D. Raedt. DeepProbLog: Neural Probabilistic Logic Programming. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 31. Curran Associates, Inc., 2018.
J. Mao, C. Gan, P. Kohli, J. B. Tenenbaum, and J. Wu. The neuro-symbolic concept learner: Interpreting scenes, words, and sentences from natural supervision. In 7th International Conference on Learning Representations, ICLR 2019, 2019.
G. Marcus. The Next Decade in AI: Four Steps Towards Robust Artificial Intelligence. arXiv preprint arXiv:2002.06177, 2020.
G. F. Marcus. The Algebraic Mind: Integrating Connectionism and Cognitive Science. The MIT Press, 2001. ISBN 9780262279086. In Special Collection: CogNet.
J. McAuley. Recommendation on live-streaming platforms: Dynamic availability and repeat consumption. Proceedings of the 15th ACM Conference on Recommender Systems, 2021.
M. McCloskey and N. J. Cohen. Catastrophic interference in connectionist networks: The sequential learning problem. In G. H. Bower, editor, Psychology of Learning and Motivation, volume 24, pages 109-165. Academic Press, 1989. doi: 10.1016/S0079-7421(08)60536-8.
L. McInnes, J. Healy, N. Saul, and L. Großberger. Umap: Uniform manifold approximation and projection. Journal of Open Source Software, 3(29): 861, 2018. doi: 10.21105/joss.00861. URL https://doi.org/10.21105/ joss.00861.
M. Mermillod, A. Bugaiska, and P. Bonin. The stability-plasticity dilemma: Investigating the continuum from catastrophic forgetting to age-limited learning effects. Frontiers in psychology, 4:504, 2013.
S. Min, X. Lyu, A. Holtzman, M. Artetxe, M. Lewis, H. Hajishirzi, and L. Zettlemoyer. Rethinking the role of demonstrations: What makes in-context learning work? arXiv preprint arXiv:2202.12837, 2022.
K. Musgrave, S. Belongie, and S.-N. Lim. Unsupervised domain adaptation: A reality check. arXiv preprint arXiv:2111.15672, 2021.
A. Newell and H. A. Simon. Human problem solving. Prentice-Hall, page 920, 1972.
H. Nori, Y. T. Lee, S. Zhang, D. Carignan, R. Edgar, N. Fusi, N. King, J. Larson, Y. Li, W. Liu, R. Luo, S. M. McKinney, R. O. Ness, H. Poon, T. Qin, N. Usuyama, C. White, and E. Horvitz. Can generalist foundation models outcompete special-purpose tuning? case study in medicine. arXiv preprint arXiv:2311.16452, 2023.
NVIDIA. Nvidia gtc 2024 keynote. https://www.nvidia.com/gtc/, 2024. Accessed: 2024-07-05.
OpenAI. GPT-4 Technical Report. arXiv, 2023.
J. V. Oswald, E. Niklasson, E. Randazzo, J. Sacramento, A. Mordvintsev, A. Zhmoginov, and M. Vladymyrov. Transformers learn in-context by gradient descent. In International Conference on Machine Learning, pages 35151-35174. PMLR, 2023.
L. Ouyang, J. Wu, X. Jiang, D. Almeida, C. L. Wainwright, P. Mishkin, C. Zhang, S. Agarwal, K. Slama, A. Ray, J. Schulman, J. Hilton, F. Kelton, L. E. Miller, M. Simens, A. Askell, P. Welinder, P. F. Christiano, J. Leike, and R. J. Lowe. Training language models to follow instructions with human feedback. arXiv preprint arXiv:2203.02155, 2022.
A. Patel, M. Hofmarcher, C. Leoveanu-Condrei, M.-C. Dinu, C. Callison-Burch, and S. Hochreiter. Large language models can self-improve at web agent tasks. In Advances in Neural Information Processing Systems (under review), 2024.
V. Patil*, M. Hofmarcher*, M.-C. Dinu, M. Dorfer, P. M. Blies, J. Brandstet-ter, J. Arjona-Medina, and S. Hochreiter. Align-RUDDER: Learning from few demonstrations by reward redistribution. In K. Chaudhuri, S. Jegelka, L. Song, C. Szepesvari, G. Niu, and S. Sabato, editors, Proceedings of the 39th International Conference on Machine Learning, volume 162 of Proceedings of Machine Learning Research, pages 17531-17572. PMLR, 17-23 Jul 2022.
X. Peng, Q. Bai, X. Xia, Z. Huang, K. Saenko, and B. Wang. Moment matching for multi-source domain adaptation. CoRR, abs/1812.01754, 2018.
M. Qu and J. Tang. Probabilistic logic neural networks for reasoning. In Proceedings of the 33rd International Conference on Neural Information Processing Systems, 2019.
M. Ragab, E. Eldele, W. L. Tan, C.-S. Foo, Z. Chen, M. Wu, C.-K. Kwoh, and X. Li. Adatime: A benchmarking suite for domain adaptation on time series data. ACM Trans. Knowl. Discov. Data, 17(8), may 2023. ISSN 1556-4681. doi: 10.1145/3587937.
M. M. Rahman, C. Fookes, M. Baktashmotlagh, and S. Sridharan. On minimum discrepancy estimation for deep domain adaptation. Domain Adaptation for Visual Understanding, 2020.
H. Ramsauer, B. Schäfl, J. Lehner, P. Seidl, M. Widrich, T. Adler, L. Gruber, M. Holzleitner, M. Pavlović, G. K. Sandve, et al. Hopfield networks is all you need. arXiv preprint arXiv:2008.02217, 2020.
L. Ruis, A. Khan, S. Biderman, S. Hooker, T. Rocktäschel, and E. Grefen-stette. Large language models are not zero-shot communicators. CoRR, abs/2210.14986, 2022. doi: 10.48550/arXiv.2210.14986.
K. Saito, D. Kim, P. Teterwak, S. Sclaroff, T. Darrell, and K. Saenko. Tune it the right way: Unsupervised validation of domain adaptation via soft neighborhood density. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 9184-9193, 2021.
A. Scherlis, K. Sachan, A. S. Jermyn, J. Benton, and B. Shlegeris. Polyseman-ticity and capacity in neural networks. arXiv preprint arXiv:2210.01892, 2022.
T. Schick, J. Dwivedi-Yu, R. Dessì, R. Raileanu, M. Lomeli, L. Zettlemoyer, N. Cancedda, and T. Scialom. Toolformer: Language models can teach themselves to use tools, 2023.
J. Schrittwieser, I. Antonoglou, T. Hubert, K. Simonyan, L. Sifre, S. Schmitt, A. Guez, E. Lockhart, D. Hassabis, T. Graepel, T. Lillicrap, and D. Silver. Mastering atari, go, chess and shogi by planning with a learned model. Nature, 588(7839):604-609, 2020. doi: 10.1038/s41586-020-03051-4.
K. Schweighofer, M. Hofmarcher, M.-C. Dinu, P. Renz, A. Bitto-Nemling, V. P. Patil, and S. Hochreiter. Understanding the Effects of Dataset Characteristics on Offline Reinforcement Learning. In Deep RL Workshop NeurIPS 2021, 2021.
K. Schweighofer*, A. Radler*, M.-C. Dinu*, M. Hofmarcher, V. P. Patil, A. Bitto-nemling, H. Eghbal-zadeh, and S. Hochreiter. A dataset perspective on offline reinforcement learning. In S. Chandar, R. Pascanu, and D. Precup, editors, Proceedings of The 1st Conference on Lifelong Learning Agents, volume 199 of Proceedings of Machine Learning Research, pages 470-517. PMLR, 22-24 Aug 2022.
D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. van den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, S. Diele-man, D. Grewe, J. Nham, N. Kalchbrenner, I. Sutskever, T. P. Lillicrap, M. Leach, K. Kavukcuoglu, T. Graepel, and D. Hassabis. Mastering the game of Go with deep neural networks and tree search. Nature, 529(7587): 484-489, 2016. doi: 10.1038/nature16961.
D. Silver, T. Hubert, J. Schrittwieser, I. Antonoglou, M. Lai, A. Guez, M. Lanctot, L. Sifre, D. Kumaran, T. Graepel, T. Lillicrap, K. Simonyan, and D. Hassabis. Mastering chess and shogi by self-play with a general reinforcement learning algorithm. arXiv preprint arXiv:1712.01815, 2017a.
D. Silver, J. Schrittwieser, K. Simonyan, et al. Mastering the game of go without human knowledge. Nature, 550:354-359, 2017b. doi: 10.1038/ nature24270.
K. Singhal, S. Azizi, T. Tu, S. S. Mahdavi, J. Wei, H. W. Chung, N. Scales, et al. Large language models encode clinical knowledge. Nature, 620(7972): 172–180, 2023.
R. Siripurapu, V. P. Patil, K. Schweighofer, M.-C. Dinu, T. Schmied, L. E. F. Diez, M. Holzleitner, H. Eghbal-Zadeh, M. K. Kopp, and S. Hochreiter.
Infodist: Online distillation with informative rewards improves generaliza-
Infodist:使用信息奖励的在线蒸馏改善了泛化能力
tion in curriculum learning. In Deep Reinforcement Learning Workshop NeurIPS, 2022.
C. A. Steinparz, T. Schmied, F. Paischer, M.-C. Dinu, V. P. Patil, A. Bitto-Nemling, H. Eghbal-zadeh, and S. Hochreiter. Reactive exploration to cope with non-stationarity in lifelong reinforcement learning. In Conference on Lifelong Learning Agents, pages 441-469. PMLR, 2022.
M. Sugiyama, M. Krauledat, and K.-R. Müller. Covariate shift adaptation by importance weighted cross validation. J. Mach. Learn. Res., 8:985-1005, dec 2007. ISSN 1532-4435.
M. Sugiyama, T. Suzuki, and T. Kanamori. Density Ratio Estimation in Machine Learning. Cambridge University Press, 2012.
B. Sun, J. Feng, and K. Saenko. Correlation alignment for unsupervised domain adaptation. Domain Adaptation in Computer Vision Applications, pages 153-171, 2017.
Tesla. Tesla AI day 2022 live stream. https://www.teslaclub.at/ tesla-ai-day/, 2022. Accessed: 2024-07-05.
H. Touvron, T. Lavril, G. Izacard, X. Martinet, M.-A. Lachaux, T. Lacroix, B. Rozière, N. Goyal, E. Hambro, F. Azhar, A. Rodriguez, A. Joulin, E. Grave, and G. Lample. Llama: Open and efficient foundation language models, 2023.
E. Tzeng, J. Hoffman, N. Zhang, K. Saenko, and T. Darrell. Deep domain confusion: Maximizing for domain invariance. arXiv preprint arXiv:1412.3474, 2014.
A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin. Attention is all you need. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017.
J. Wei, X. Wang, D. Schuurmans, M. Bosma, B. Ichter, F. Xia, E. H. Chi, Q. V. Le, and D. Zhou. Chain of thought prompting elicits reasoning in large language models. In A. H. Oh, A. Agarwal, D. Belgrave, and K. Cho, editors, Advances in Neural Information Processing Systems, 2022.
Y. Weng, M. Zhu, F. Xia, B. Li, S. He, S. Liu, B. Sun, K. Liu, and J. Zhao. Large language models are better reasoners with self-verification. arXiv preprint arXiv:2212.09561, 2023.
G. Widmer and M. Kubat. Learning in the presence of concept drift and hidden contexts. Machine learning, 23:69-101, 1996.
S. Wolfram. What is ChatGPT Doing ... and why Does it Work? Wolfram Media, Incorporated, 2023. ISBN 9781579550813.
Z. Xu, H. van Hasselt, and D. Silver. Meta-gradient reinforcement learning. ArXiv, 2018.
S. Yao, D. Yu, J. Zhao, I. Shafran, T. L. Griffiths, Y. Cao, and K. Narasimhan. Tree of thoughts: Deliberate problem solving with large language models. arXiv preprint arXiv:2305.10601, 2023.
K. You, X. Wang, M. Long, and M. Jordan. Towards accurate model selection in deep unsupervised domain adaptation. In K. Chaudhuri and R. Salakhutdinov, editors, Proceedings of the 36th International Conference on Machine Learning, volume 97 of Proceedings of Machine Learning Research, pages 7124-7133. PMLR, 09-15 Jun 2019.
D. Yu, B. Yang, D. Liu, H. Wang, and S. Pan. A survey on neural-symbolic learning systems. Neural Networks, 166:105-126, 2023. ISSN 0893-6080.
W. Zellinger, T. Grubinger, E. Lughofer, T. Natschläger, and S. Saminger-Platz. Central moment discrepancy (cmd) for domain-invariant representation learning. International Conference on Learning Representations, 2017.
W. Zellinger, N. Shepeleva, M.-C. Dinu, H. Eghbal-zadeh, H. D. Nguyen, B. Nessler, S. Pereverzyev, and B. A. Moser. The balancing principle for parameter choice in distance-regularized domain adaptation. In M. Ran-zato, A. Beygelzimer, Y. Dauphin, P. Liang, and J. W. Vaughan, editors, Advances in Neural Information Processing Systems, volume 34, pages 20798-20811. Curran Associates, Inc., 2021.
H. Zhao, F. Yang, H. Lakkaraju, and M. Du. Towards uncovering how large language model works: An explainability perspective. arXiv preprint arXiv:2402.10688, 2024.
Y. Zhu, F. Zhuang, J. Wang, G. Ke, J. Chen, J. Bian, H. Xiong, and Q. He. Deep subdomain adaptation network for image classification. IEEE Transactions on Neural Networks and Learning Systems, 32(4):1713-1722, 2021.

Appendix A Glossary
附录 A 术语表

AI Artificial intelligence.
AI 人工智能。
DA Domain Adaptation.
DA 领域适应。
IWA Importance Weighted Aggregation.
IWA 重要性加权聚合。
LLM Large Language Model.
LLM 大型语言模型。
LLMs Large Language Models.
LLMs 大型语言模型。
NeSy Neuro-Symbolic.
NeSy 神经符号。
UDA Unsupervised Domain Adaptation.
UDA 无监督领域适应。